GROUNDWORK OF 
MATHEMATICAL 
PROBABILITY | 
AND STATISTICS 


AMRITAVA GUPTA 


ACADEMIC PUBLISHERS e CALCUTTA e NEW DELHI 


a 
è 
E 
D 
- 
* 
E 
= 
. 
a - 
ә - 
Ё 
d = 
b 
D 
4 
# 
РЕ 
Ф 
LI 


GROUNDWORK OF 
MATHEMATICAL PROBABILITY AND STATISTICS 


-à Debajyoti Das 
Statistics in Biology and Psychology 


ойыш eee 


P 
GROUNDWORK OF 
MATHEMATICAL PROBABILITY AND 
STATISTICS 


AMRITAVA GUPTA 


Reader in Applied Mathematics, University of Calcutta 


ACADEMIC PUBLISHERS 
CALCUTTA э NEW DELHI 


© Reserved by the author 


First Edition August 1962 
Second Edition September 1971 
Third Edition April 1983 


Price rupees forty only 


T.. West Benga, 5\4.4 
se > E 52- ^ Oo | 


„Ө C OVI 


The paper used for printing this book has been made available at 
concessional rates by the Government of India. 


ACADEMIC PUBLISHERS 
5A, Bhawani Dutta Lane, Calcutta-700073 
and also at 


SHANTIMOHUN HOUSE, 
I-1/16, Ansari Road, New Delhi-110002 


To My Students 


who gave me the opportunity of learning the subject 


PREFACE TO THE THIRD EDITION 


After being out of print for quite some time, here is an enlarged 
third edition. The enlargement in the main consists in providing 
partial solution or broad hints to all relatively difficult problems 
given as exercises along with their answers. This would presumably 
be helpful to the young learners. In Chapter 4 a section has been 
added on Markov Chain which is interesting theoretically as well as 
in applications. 

As in the second edition of the book, use has been made of the 
simple notion of monotonic sequences of sets, with the help of which 
the basic properties of the probability function and the distribution 
function have been rigorously established. In fact, it is observed that 
if we are prepared to ignore the rather delicate question of measura- 
bility, we can develop without much effort and complexity a mathema- 
tical theory of probability which is logically quite accurate and fairly 
complete. 

I wholeheartedly thank the Academic Publishers for their keen 
interest and sincere co-operation in the publication of the present 
edition of the book. 


AMRITAVA GUPTA 
Calcutta, April, 1983 


PREFACE TO THE FIRST EDITION 


The present-day theories of probability and statistics have been 
placed on a sound logical basis, but a rigorous exposition of the same 
requires higher mathematical tools like the concepts of measure and 
integration which are used in almost all the authoritative works on 
the subject. On the other hand, the elementary text-books, perhaps 
with a view to avoiding difficult mathematics, do not pay adequate 
attention to the logical content and seek only to give a compilation 
of the vaious working formulae derived mainly from intuitive 
considerations. As such a beginner who is not equipped enough to 
read the authoritative works has to depend on these elementary text 
books, and a searching mind often finds itself uncomfortable against 
the vague and loose concepts presented therein. Accordingly, in this 
book I have set myself to the task of giving a logically satisfactory 
and complete account of the general principles of the subject as much 
as possible within the reach of relatively simple mathematical tools. 
A simple knowledge of elementary analysis is a sufficient pre- 
requisite for reading this book ; а few other mathematical tools 
necessary, i.e. simple notions of sets, step functions etc. have been 
developed in the course of the text. 


I have started with the idea of random experiments and event 
spaces and subsequently developed an axiomatic theory of probability 
based on a set of simple axioms, particular emphasis being laid ОЛ 
the fundamental concepts and the logical coherence of the develop- 
ment. The problem of regression has been treated in some details, 
and certain departures from the traditional modes of treatment will 
be noticed in this connection. It was a pleasant surprise to find that 
correlation ratio can be defined as a correlation coefficient, from 
which many of its important properties immediately follow. The 
fundamental limit theorems like the central limit theorem, the 
continuity theorem for characteristic functions etc. have been 
assumed without proof, and only the simple consequences discussed. 


[ ix ] 


In statistics, although our major pre-occupation is with the 
mathematical aspects of the subject, some amount of descriptive 
elements and computational procedures have also been incorporated. 
In testing of hypotheses, the general Neyman-Pearson theory of best 
critical region and the method of likelihood ratio testing have been 
propounded, and all the important tests are deduced directly from these 
general principles. I have added a chapter on the theory of errors, 
which is of interest to many, treated in terms of modern statistical 
concepts and terminology. An attempt has also been made to give a 
unified version of the principle of least squares which does not 
usually appear in exactly the same form in different contexts like 
regression, theory of errors etc. 


I have made use of almost all the books I could lay hands on. 
The influences of Cramer, Feller, Kendall and Wilks, in particular, 
will be clearly discernible in the text. A bibliography appears at 
the end of the book. 


Iam indebted to Sir Ronald A. Fisher, F. R. S., Cambridge and 
to Dr. Frank Yates, F. R. S., Rothamsted, also to Messrs. Oliver & 
Boyd Ltd., Edinburgh, for permission to reprint Tables No. II, , Ш, IV, 
V from their book ‘Statistical Tables for Biological, Agricultural 
and Medical Research’. I acknowledge debt to Messrs. К. P. Sarker, 
А. K. Sarkar and D. Sen for supplying me with many interesting 
statistical data. I owe particular gratitude to my revered teacher and 
colleague Dr. B. S. Ray whose constant inspiration was a guiding force 
throughout the preparation of the book. I heartily thank Messrs. 
$. Ray Chaudhury, J. Paul and Р. К. Chatterjee for their assistance in 
proof-reading and the publishers and the printers for their sincere co- 
operation. Lastly, a great share of my thanks giving is kept in store 
for those who will in future help me by offering criticisms and 
suggestions for improvement. 


AMRITAVA GUPTA 
Calcutta, August 1962 


CONTENTS 


MATHEMATICAL PROBABILITY 


CHAPTER 1. Event Spaces 
1.1 Random experiments or observation—1.2 Еуепіѕ-ѕітріе апі 
compound—1.3 Mathematical tools: preliminary notions of 
sets—1.4 The event space—1.5 Exercisrs 


CHAPTER2. Historical Background 
2. Introduction—2,2 The classical definition—2.3 Statistical 
regularity and the frequency definition of probability 


CHAPTER 3. Fundamental Axioms 
3.1 Axioms of mathematical probability—3.2 Conditional proba- 
bility—3.3 Stochastic independence —3.4 Exercises 


CHAPTER 4. Compound Experiments 
4.1 Cartesian product of sets—4.2 Joint independent experi- 
ments—4.3 Repeated independent trials —4.4 Bernoulli trials—4.5 
Poisson trials—4.6 Multinomial law—4.7 Infinite sequence of 
Bernoulli trials —4.8 Markov chains —4.9 Exercises 


CHAPTER 5. Probability Distributions 
5.1 Mathematical tools: functions on sets—5.2 Random vari- 
ables—5.3 Distribution function—5.4 Mathematical tools : step- 
function—5.5 Discrete distributions —5.6 Important discrete dis- 
tributions—5.7 Continuous distributions—5.8 Important con- 
tinuous distributions—5.9 Transformation of random variables— 
5.10 Exercises 


CHAPTER 6. Two-dimensional Distributions 
6.1 Distribution function in two dimensions—6.2 Discrete dis- 
tributions—6.3 Continuous distributions—6.4 Important two- 
dimensional or bivariate continuous distributions—6.5 Conditional 
distributions—6.6 Transformation of random variables in two 
dimensions—6.7 Extensions to many dimensions. Mutual inde- 
pendence—6.8 Exercises 

CHAPTER 7, Mathematical Expectations I 
7. Mathematical expectation or mean value—7.2 Mean—7.3 
Moments—7.4 Variance—7.5 Third central moment—7.6 Fourth 
central moment—7.7 Moment generating function —7.8 Charac- 


12 


21 


46 


66 


94 


127 


[ xii ] 


teristic function—7.9 Semi-invariants ог cumulants—7.10 
Median—7.11 Mode—7.12 Quantiles—7.13 Some remarks—7.14 
Exercises 


CHAPTER 8. Mathematical Expectations II 152 
A. Two-dimensional Case—8.1 Expectation for a bivariate dis- 
tribution—8.2 Moments—8.3 Covariance, correlation coefficient-- 

8.4 Characteristic function—8.5 Some extensions to n-dimensions 
B. Independent Random Variables—8.6 Multiplication rule for 
expectations—8.7  Moments— 8.8 Characteristic function— 8.9 
Another discussion on Bernoulli trials 

C. Conditional Expectations and Regression—8.10 Conditional 
expectation— 8.11 Regression curves— 8.12 Least square regression 


Curves—8.13 Regression lines—8.14 Parabolic curve fitting—8.15 
Correlation ratio—8.16 Exercises 


CHAPTER 9, Special Distributions 188 
9.1 x?-distribution—9,2 f-distribution—9,3 . F-distribution—9.4 
Exercises 

CHAPTER 10. Convergence ‘in Probability’ 197 


10.1 Tchebycheff’s inequality. 
10.3 Exercises 


CHAPTER 11, 


—10.2 Convergence ‘in probability'— 


Limit Theorems 204 
11.1 Normal approximation to the binomial distribution—11.2 
Fundamental limit theorems—11.3 Exercises 


MATHEMATICAL STATISTICS 


CHAPTER 12. Random Samples 215 
12.1 Populations and samples—12.2 Distribution of the sample— 
12.3 Tables and graphical representations—12.4 Sample charac- 
teristics—12.5 Computation of sample characteristics—12.6 
Exercises 


CHAPTER 13. Sampling Distributions 230 
13.1 Sampling distributions of ‘statistic’s—13,2 Estimates-consis- 


tent and unbiased—13.3 Important sampling distributions—13,4 
Normal population—13.5 Exercises 


CHAPTER 14. Estimation of Parameters 241 
14.1 Method of maximum likelihood—14,2 Applications to 
different populations —14.3 Interval estimation—14.4 Method for 
finding confidence intervals—14.5 Applications — © 
population—14.6 ^ Approximate confidence _intervals—14. 


Exercises 


[ xiii ] 
CHAPTER 15. Bivariate Samples 
15,1 Sample from a bivariate population—15.2 Practical com- 


Putation—15,3 Least square curve fitting—15,4 Maximum likeli- 
hood estimation—15.5 Exercises 


CHAPTER 16. Testing of Hypotheses I 
16.1 Statistical hypotheses-simple and composite—16.2 General 
form of a test. Best critical region—16.3 Best critical regions 
for simple hypotheses—16.4 Applications to normal (т, с) popu- 
lation—16.5 Likelihood ratio testing—16.6 Normal (т, с) popu- 
lation—16.7 Comparison of normal populations—16,8 Bivariate 
normal population—16.9 Exercises 


CHAPTER 17. Testing of Hypotheses II 
17.1 Binomial (n, p) population —17.2 Comparison of binomial 
populations—17.3 Poisson-« population—17.4 Multinomial dis- 
tribution —17 5 Multinomial population—17.6 x?-test of goodness 
of fit 17.7 Exercises 

CHAPTER 18. Theory of Errors 


18.1 Introduction—18.2 The normal law—18.3 Some definitions— 


18.4 Estimation—18.5 Weighted measurements—18.6 Indirect 
Observations—18.7 Exercises 


257 


271 


302 


316 


n 

м 

wa Lr 2 
a a 

quo de wy Д, 
ME 

E 


LES I 


MATHEMATICAL PROBABILITY 


CHAPTER 1 
EVENT SPACES 


1.1 RANDOM EXPERIMENTS OR OBSERVATIONS 


The word ‘probability’ figures very often in our everyday speech and 
in a wide variety of contexts; for example, ‘probably it will rain 
to-morrow’, ‘probably he is an honest man’, ‘the probability that there 
will be a bumper crop in the next season is very small’, ‘what is the 
probability of a double six in a throw ofa pair of dice” and so on. 
Any attempt towards a theory of probability naturally begins with 
the question as to the probability of what we are interested in. The 
immediate answer is obviously—events. This, however, only introduces 
à general name which is in no way self-explanatory. Our first task then 
Will be to make precise the meaning of the term ‘event’ and the proper 
context in which it will be used in our mathematicaltheory. For this, 
we come to the idea of what are called random experiments or observa- 
tions. 

Let us take the case of tossing a coin. We know that there are 
two possible outcomes—‘head’ and ‘tail’, and that it is impossible 
to predict if the result of a toss will be a ‘head’ or ‘tail’. Consider a 
similar experiment of rolling a die from a box; there are only six 
possible results, viz. the faces marked 1, 2,...... 6, but here also the 
result of a particular throw is completely unpredictable. Or suppose 
We are concerned with the measurement of a physical quantity by 
means of a precision instrument. Students of physics know that the 
result of a measurement does not exactly give the trué value of the 
quantity but a value close to it due to what are called experimental 
errors. If repeated observations are taken, the measured values are 
not again the same but fluctuate in an unpredictable manner. Неге 
we can take, at least for theoretical considerations, that the possible 
results comprise all the real numbers, but the number given by a 
single measurement cannot be exactly predicted. In our mathemati- 
cal theory, we shall only consider the class of those experiments or 


2 EVENT SPACES [ia 


observations, for which we know a priori the set of all different 
possible results or outcomes, and which are such that it is impossible 
to predict which one of this set will occur at any particular perfor- 
mance of the experiment. Such experiments are called random 
experiments, the word ‘random’ pertaining to the above-mentioned 
lack of predictability. As such, if a random experiment is repeated 
under identical conditions, the results will vary at random. 


So far we have said nothing about the reasons for this randomness. 
The reasons are, however, manifold and not always clearly understood 
and, for that matter, not also essential for our theory of probability. 
Our theory, in fact, starts with accepting this idea of randomness 
and need not explain it. Still, in order to get a deeper glimpse into 
the situation, let us consider the process of, say, throw with a die. 
The die is shaken well in a box and thrown ona table, and suppose 
that the first result is ‘six’. If now the die is thrown again under 
identical conditions, the result will, however, be not necessarily ‘six’, 
but may be any one of the six faces. This may seem somewhat 
paradoxical if we are pondering that the mechanical behaviour of 
the die should be uniquely determined by the initial conditions of 
throwing and, of course, the laws of mechanics, and, as such, if the 
die is thrown under identical conditions, the results must also be 
the same. The explanation, however, lies in the fact that, although 


throwing a die looks a simple affair, it is a very complicated mechani- 
cal process, and it is practically impossible to create exactly identical 
initial conditions of throw. These conditions vary, 
at random, and this produces the unpredictable variability of results 
In а sequence of repetitions of the experiment. 


Thus what we mean 
by creating identical conditions is only keeping the relevant conditions 
of the experiment as uniform as possible, and let us bear in mind 
that the phrase identical conditions will henceforth be used only in 
this approximate sense, 


however subtly, 


Consider now a slightly different case. 
particles be moving in a given field of fore 
Same initial position and with the same initia’ 
that the initial conditions are really identical, and if it is observed 
that at a subsequent instant the particles occupy different positions, 
we are faced with yet another type of and a more difficult logical 


Let two equally massive 
e, being let go from the 
1 velocity. Ifit is known 


1.2] SIMPLE AND COMPOUND EVENTS 3 


problem. It will be interesting to mention that such a problem is 
not a hypothetical one, but was actually observed by the physicists 
їп а very small scale, viz. the atomic scale. Physicists were, however, 
divided in their approach towards the explanation of such a pheno- 
menon. One school of thought took up a radically new outlook and 
proclaimed that the laws of Nature are possibly not exactly fixed and 
can make room for small random fluctuations which make themselves 
felt ina small scale. This philosophy goes by the name of Principle 
of Indeterminacy. The orthodox school, however, continued to believe 
that Nature must be guided by perfectly deterministic laws, but it 
might be that the classical laws break down in a small scale and need 
be replaced by possibly more complicated laws necessitating the use 
of more complicated initial conditions. 


To sum up, for reasons known or unknown there must be some 
intrinsic variability in the process of our experiments which would 
make them random. If, by increasing the perfection of the experi- 
mental process or otherwise, it is possible to get rid of this variability 
so that particular results become predictable, the experiment ceases 
to be random and is naturally pushed out of the realm of our 
probability theory. 


12 EVENTS—SIMPLE AND COMPOUND 


The outcomes or results of a random experiment will be called events 
connected with the experiment, e.g. ‘head’ and ‘tail’ are results of the 
random experiment of throwing a coin and hence are events connected 
with it. We can distinguish between two types of events— simple 
and compound. To understand this, consider the experiment of 
rolling a die ; ‘one’, *two',...'six' are certainly events connected with 
the experiment. Now the result ‘six’ can also be described under a 
different title, say, ‘even face’ or ‘multiple of three’, only that in 
the latter cases the result ‘six’ is not uniquely specified. Thus ‘even 
face’ and ‘multiple of three’ are also events. But the event ‘even 
face’ not only occurs when the result is ‘six’ but also when the result 
is ‘two’ or ‘four’, and we say that the event ‘even face’ can be 
decomposed into the events ‘two’, ‘four’ and ‘six’. Similarly, the 
event ‘multiple of three’ can be decomposed into the events ‘three’ 
and ‘six’, The events ‘one’, ‘two’ etc. cannot, however, be further 


4 EVENT SPACES [1.2 


decomposed. Events which cannot be further decomposed are called 
simple events, and compound events are those which can be decomposed 
into simple events. 


An event which is sure to occur at every performance of the 
experiment is called a certain event. For example, ‘one or two or 
three...or six’ is a certain event in connection with throw of a die. 
We may also think of events which are logically impossible, i.e. which 
cannot occur at any performance of the experiment. Such events 
will be called impossible events. The event ‘seven’ can never 
be the result of throwing a die and is thus an impossible event. 
Clearly, a certain event can be decomposed into all the possible 


simple events, while an impossible event cannot be decomposed into 
any one of them. 


Tf when an event occurs another event invariably occurs, then the 
former event is said to imply the latter event. For example, the event 
‘two or four’ implies the event ‘even face’ for throwing a die, If 
an event implies another event, then the simple events into which the 
first event can be decomposed are also some of the simple events 
into which the second event can be decom 


posed. Obviously, an y 
event implies the certain eyent. 


Two events are said to be equivalent от identical if any one of them 
implies and is implied by the other. Thus the events 


‘two or four 
Ог six’ and ‘even face’ are identical. 


An event may be titled 'either even face or multiple of three or 
both’; this is a compound event which can be decomposed into the 
four simple events ‘two’, ‘three’, ‘four’ and ‘six’. 


The events ‘even face’ and ‘multiple of three’ occur simultaneously 


when and only when the result is ‘six’. In other words, we may say 
that the event ‘joint occurrence of even face and multiple of three’ can 
be decomposed into the simple event ‘six’, 

If two events are such that the 
are said to be mutually exclusive, 
mutually exclusive events, 


у cannot occur simultaneously, they 
e.g. ‘even face’ and ‘odd face’ are 

We note that two simple events are always 
mutually exclusive, but com 


pound events may or may not be so. Thus 
the compound events “even face’ and ‘multiple of three’ are not mutually 
exclusive, 


1.3] NOTIONS OF SETS 5 


An event which consists in the negation of another event is called 
the complementary event of the latter event. The complementary event 
of ‘multiple of three’ is obviously ‘not multiple of three’ which can be 
decomposed into{the simple events ‘one’, ‘two’, four’ and ‘five’. Note 
that the complementary event of a certain event is an impossible event 
and vice versa. 


For recollection, take another example. Let the experiment consist 
in drawing a card at random from a well-shuffled pack of playing cards. 
There are 52 possible simple events. The event ‘spade’ can be 
decomposed into 13 simple events, and the event ‘queen’ into 4 simple 
events. The event ‘king or queen or jack of spades’ implies the event 
‘spade’. The events ‘spade’ and ‘queen’ are not mutually exclusive, 
and the event of their joint occurrence is the simple event ‘queen of 
spades’. The event ‘either spade or queen or both’ will be decomposed 
into 16 simple events. ‘Eleven of spade’ is an impossible event ; ‘any 
card’ is a certain event. The complementary event of ‘spade’ is ‘club 
or heart or diamond’. 


For a precise mathematical formulation of the concept of events 
connected with a random experiment, we would require some know- 
ledge of the theory of sets which we now develop. 


1.3 MATHEMATICAL TOOLS : PRELIMINARY NOTIONS 
OF SETS 


The aggregate or collection of all possible objects having given pro- 
perties will be called a set. The objects belonging to the set are called 
elements of the set. For example, the set of chairs in a particular 
room, the set of non-negative integers, the set of real numbers x such 
that a < x < b etc. 


When an element a belongs to a set S, we write in symbols a € S. 


If every element of a set A belongs to a set S, then we say that A is 
contained in S, or S contains A, or that A is a subset of S and write 
symbolically 4 C S or S D A. 


КАС S and 5 C А, i.e. every element of A belongs to S and every 
element of S belongs to A, then we say that the sets A and S are 
identical or equal and write A=S. 


6 EVENT SPACES [1.3 


If A C S, but A # S, i.e. every element of А belongs to S but there 
is at least.one element of S which does not belong to A, then А is said 
to be a proper subset of S and written as AC S. 


A null or an empty set is one which does not contain any element 
at all and will be denoted by O. 


We note the following : 
1. Every set is a subset of itself. 
2. Anempty set is a subset of every set. 


9. A set containing only one element is conceptually distinct from 
the element itself but will be represented by the same symbol for the 
sake of convenience, 

Often we are concerned with the study of various subsets of a 
given set S. In such cases, it is customary to use a\geometric termino- 
logy, in which the elements of S are called points and the set S is called 
а space. Let S be a-given space, and A, B, C...... p Аз» vss Аз... 
denote subsets of S. 

The sum or union of two sets A and B is denoted by A+B or 
A U B and is defined to be the set of all elements belonging to either A 
or B or both. Note that A+B is also a subset of S. 

The product or intersection of two sets A and B is denoted by AB 
or A N B and is defined to be the set of all elements belonging to both 
Aand B. Then AB C S. If AB = O, the sets A and В are said to be 
disjoint. 


It is easily seen that the following laws hold for the above-defined 
addition and multiplication of sets. 


(i) а кайт” @ +(B+C) | (associative laws) 
i) Aspe 
Gi) Bad A +A } (commutative laws) 


(iii) A(B+C)=AB+AC (distributive law) 


By virtue of (i) and (ii) we can write without ambiguity 
A, As b As and 4,44 A, 


where the order of terms or factors i 


S arbitrary. Thus the sum 
ALT Age 


* + А„ is the set of all elements belonging to at least one of 


13] NOTIONS OF SETS 7 


the sets 4,, Ae, ... An, and the product 4,4... An is the set of all 
elements belonging to each one of the sets A,, Bigs cee 

Let B C A. The difference А- B is defined to be the set of all 
elements of A which do not belong to B. In particular, the set S- A 
will be called the complement of A in S and will be denoted by A, i.e. 
A=S-—A. It follows obviously 


А+А=5, AÁ-O and A=A (1.3.1) 
For any two sets A and B we may write 
A+B=(A-AB)+AB+(B-AB) 13:2) 


where the sets A-AB, AB and B-AB are pairwise disjoint. This 
formula can be verified from the following diagram (Fig. 1), in which ~ 
the region bounded by the bold line represents the set A+B, the 
shaded region represents AB, and the unshaded parts of A and B the 
sets A — AB and B - AB respectively. 


Fig. 1 


Also note the following interesting results : 
A+A=A, АА= А 


5+4=5, SA=A (1.3.3) 
5-0, O=S 
We may easily prove that 
(4+ В)-А В (1.3.4) 
АВ-А+В . 


Gm 


8 EVENT SPACES [1.8 


Generalising to п sets A,, As, +- A, we have, if 
X=A,+Agt + +Ay 


then 
a iy s (1.3.5) 
А, EOD +A, 
Sequences of sets 
Consider now an infinite sequence {An} of sets 4, C S(n=1, 2535 
The sum or union 
А tAs rb As m > Ay 
n=l 
is defined to be the set of all elements which belong to 4, for at least 
one value of п, and the product or intersection 
4,4, A, = II A, 
n=1 
is defined to be the set of all elements which belong to An for every 
value of n. We remark that these definitions of infinite sum and infinite 
product are purely logical and are free.from any limiting process. 


The generalisations of (1.3.5) for a sequence of sets, {An} will be 
formal. If 


X-xA, Ү=ПА4,` 
then 
X= А„, Ү= А, (1.3.6) 
A sequence of sets {An} is said to be monotonic non-decreasing or 
expanding if Any, 2 А, for every value of n, and {An} is said to be 
Monotonic non-increasing or contracting if Ay... С A, for every п. 


Let {An} be an expanding sequence of sets. Then lim A, as n — о 
is defined by 


lim A, = An 
If (4,] is a contracting sequence of sets, then lim A, is defined by 
lim Ax =I An 
It may be easily seen that if {An} is an expanding (or contracting) 
Sequence, then {A,} is a contracting (or expanding) sequence, and it 
follows from (1.3.6) that if (4,] is either expanding or contracting 
lim A, — lim A, (1.3.7) 
X 


14] THE EVENT SPACE 9 


14 THE EVENT SPACE 
Let the random experiment be denoted by Е. 


The simple events connected with E will be called event points or 
simply points, and the set of all possible event points the event space 
S of E. Then the experiment E must be such that its event space S is 
completely known. d 


Any subset A of Swill be called an event connected with E, i.e. an 
event is an aggregate of some of the event points. 


The entire space is the certain event, and the empty subset O isthe 
impossible event. 


We say that an event A implies another event B if the set A is 
contained in the set B, і.е. 4 C B. The events A and В are said to be 
equivalent or identical if the sets A and B are identical. 


For any two events A, B, the event ‘either A or B or both’ is defined 
to be the set 4 + В, and the event ‘A and B occurring simultaneously’ 
to be the set AB. The events A and B are said to be mutually 
exclusive if the sets A, В are disjoint, i.e. AB = О. 


Let Aj, As, «+++: be any finite or infinite sequence of events, Then 
the sum 4, + 45 +-+ will be called the event of occurrence of at least 
one of the events Ау, As," and the product 4,4., the event of 
simultaneous occurrence of all the events Ау, Аз... . 


A, the complement of A in S will be naturally called the comple- 
mentary event of A. Since А= А, it follows that the complementary 
event of a complementary event is the event itself. Also, since AA=O 
and A+A=S, two complementary events are mutually exclusive, and 
their sum is the certain event. The formulae $=О, O=S immediately 
show that the impossible and the certain events are complementary 
events. 


With obvious meanings we shall speak of expanding and contracting 
sequences of events and their /imiting events. 


Examples 
1. Let E denote the experiment of tossing a coin three times in 
succession. A typical event point is, say, ‘head, head, tail’ which may 


10 EVENT SPACES [14 


be denoted by the symbol (H, H, T). The event space S consists of 
8 points U,, U,, = U, given by 

U,-(H,H,H) Ug-(H,H,T) U4-(H,T, Н) 

U= Н.Н), U,-(T,T,H) U,-(,H,T) 

U,-(H,T,T) О,=(Т, Т, Т) 
and we write 
S=U,+U,+...+U, | 
Let A denote the event ‘two heads’. Then А contains the 3 points 

Us, Us, Uy, ie. A=Ug+U,+U 4. 

If B be the event ‘head in the first trial’, then 
B=U,+U,+U,+U, 
A+B=U,+U,+U,+U,+U, 

AB=U,+U, 

A-AB=U, 

B-AB=U,+U, 
We note that the events AB, А- AB and B— AB are pairwise mutually 
exclusive, and formula (1.3.2) may be easily verified. 


The event ‘no head or all tails’ is obviously the event point Us, so 

that the event ‘at least one head’ is the complementary event 
Us=S-Us=U, *Us 4 +0, 

Remark. It is perhaps clear that, while writing summation with 
U's, the symbols do not exactly denote the event points but the sets 
containing the individual points, so that a sum of U's is understood 
in the usual sense of sum of sets. It is again in the latter sense that 
event points will denote events in our theory. 

2. Let Е denote the experiment of placing two balls at random 
into three cells. Now two cases will arise according as the balls are 
distinguishable among themselves or not. It is, however, assumed 
that the given cells are distinct, 


(a) DISTINGUISHABLE BALLS. In this case the balls may be 
represented by the symbols B,, B,. The event space S will contain 
the following 9 points : 

U, -(B,| B4] -), Ш»„=(В,|— |Ba), U,-(-|B,|B«) 

U, - (В.|В,|-), U5-(B5| - | B4), U,-(- |Be|B;) 

U,=(B,Bel-|-), U,-(- |B1B,| -), U,-(-|-1B,B&) 


1.5] EXERCISES 11 


It A denotes the event ‘one ball in the second cell’, then 
A=U,+U,+U,+U, 

(b) INDISTINGUISHABLE BALLs. If the balls are indistinguishable, 
the event points may be obtained by dropping the subscripts of the B’s 
in case (a). On doing this, we note that the event points U,, Ог 
become identical, U., С, become identical and U ,, С, become identical 
giving only 6 points in the new event space S’, viz. 

U,-(B|Bi-)  Us-(B|-1B, | Us'=(- 18128) 

U,-(BB|-|-) U;-(-1BBI-), Us'-(-1-18B) 

Remark. We remark once for all that in all such problems the 
balls will be treated as distinguishable unless stated to the contrary. 


. 3. Let E consist in counting the number of telephone calls on a 
given trunkline during a fixed interval of time. The possible counts 
are 0s 1; 25... , and there is no upper limit. Hence the event space 
S consists of the set of all non-negative integers. 

4. Let Econsist in measuring the length ofa rod by a precision 
instrument. If we assume, for theoretical idealisation, that a measure- 
ment may yleld any real number, then the event space S will be the 


set of all real numbers. 
5. If E consists in observing the sex of a new-born baby, the event 


space S contains only two points ‘boy’ and ‘girl’. 


1.5 EXERCISES 

1, Prove the formula (1.3.4). | 

2. Prove the formula (1.3.5). 

3. Show that if the sequence of sets {A,} is expanding, then {4,}is a 
contracting sequence, 

4. Let A, denote the interval 
expanding sequence of sets, and 

lim А„=(—с°, оо) 

5. If A, denotes the interval -œ exe -n (п=1, 2...) then show that 

{A,} is contracting, and \ 
lim 4,20 


6. If A, denotes the half-open interval a— y <x<a (n=1, 2,...), then show 


—со<х<п (п=1,72,...). Prove that (4, is an 


that {A,} is contracting, and lim A, is the set containing a only. 


7. If A, denotes the half-open interval a<x<a+ 1, (n=1, 2,...), then prove 


the {A,} is contracting, and lim A, is the empty set. 


е 


CHAPTER 2 


HISTORICAL BACKGROUND 


2.1 INTRODUCTION 


The history of probability is a very fascinating topic, and the many 
interesting stories woven round it are largely well-known. We shall, 
in this chapter, only trace the mathematical development of the concept 
of probability, which will form the necessary background for the 
inception of the present-day axiomatic_ theory. The theory of pro- 
bability had a humble beginning in the games of chance connected 
with gambling in France in the 17th century, and since then it has 
passed through many phases of metamorphoses and has finally emerged 
as a beautiful and Sophisticated branch of mathematics. Towards 
the beginning of the 19th century, Laplace put forward a formal defini- 
tion of probability which goes by the name of the classical definition. 
The classical theory thrived mainly on the diverse problems ‘of games 
of chance and very well served the popular needs, It was, however, 
subsequently found out that the classical theory suffers from an 
intrinsic logical weakness and must be placed on a more sound and an 
entirely new basis in order to meet the requirements of the expanding 
fields of its application, viz, Statistics, economics, insurance, biology, 
physics etc. A new point of view was explored by von Mises 
the 1920’s, who gave a new definition of probability which we shall 
call the frequency definition. The frequency definition, which involves 
a limiting process, although vast] 
theory, again showed signs of mathematical i 


only in 


nelegance and operational 
e ultimately got rid of by 
robability. In this book we 
atic theory within the scope 
In the rest of this chapter, we 
and frequency definitions of 


2 


2.2] CLASSICAL DEFINITION 13 


the modern theory, which may otherwise seem somewhat strange and 
arbitrary. 

Consider a simple game of chance. A coin will be tossed ; if the 
result is ‘head’, I win and if it is ‘tail’, I lose. What is the chance 
of my winning? Any layman, we believe, will answer immediately— 
it is 50%, and if he is a little more careful, he will add—provided the 
coin is a true or symmetrical one. Take the game of throwing a die. 
If it is ‘a multiple of three’, I win, otherwise I lose. In this game, 
what should be the correct ratio of betting for honest gambling? The 
layman’s answer will undoubtedly be—1 : 2 in my’ favour assuming, 
however, that the die is symmetrical about all the six faces. In 
a precise language, we say that the probability of a ‘head’ in a toss is 
3, and that of the event ‘multiple of three’ connected with throwing 
а die is } in case we make the convention of expressing probabilities 
as fractions. But if the layman is asked how did he obtain the numbers 
i and + in the above games, he will possibly be at a loss to give a 
proper explanation and will simply say—from intuition and experience, 


-2[. THE CLASSICAL DEFINITION 


The classical definition bases itself mainly on intuition. Although 
intuition is a difficult thing to be analysed, yet, in the above cases, it 
will be quite easy to trace the law of formation of the numbers 3 and $. 
In the random experiment of tossing a coin there are 2 points 
in the event space, and the event ‘head’ contains only 1 point. 
The ratio of the number of points contained in the event ‘head’ to 
the total number of points in the event space gives the fraction 3. 
In the second case, the event space contains 6 points, of which 2 
are contained in the event ‘multiple of.three’, and the ratio of 2 to 
6 is the required number j. In both cases, however, we assume 
that all the points of the event space are mutually symmetrical. Now. 
to,explain further how the ratio obtained by the above rule represents 
the probability of an event is perhaps not possible and must be left 
entirely to intuition. Thus in the classical theory, we have the 
following definition of probability : 

Let E be a random experiment such that its event space S 
contains a finite number, say n, of event points, all of which are 
known to be equally likely or mutually symmetrical. If any event 


14 HISTORICAL BACKGROUND [2.2 


A connected with Е contains m(A) of these event points, then the 
probability of A, denoted by P(A), will be defined by 
m(A) 


PAs (2.2.1) 


Criticisms of the classical definition 


1. The classical definition, although looks simple, has a grave 
logical flaw. The use of such a definition requires a priori knowledge 
of the fact that all the event points are equally likely. Let us examine 
the phrase equally likely a little more closely. How to conclude if the 
event points of a given space are equally likely? The available 
argument was that, if the event points are mutually symmetric, they 
may be taken to be equally likely. But the next question immediately 
crops up (but ironically it was delayed in history for nearly a century !) 
—mutually symmetric in what respects? This poses a really difficult 
problem, and it was found after many serious investigations that it is 
impossible to set forth definite general criteria for mutual symmetry of 
the event points, and any such set of criteria includes the tacit assump- 
tion that the event points are symmetrical in the sense of probability” 
itself. This amounts to begging the concept of probability before we 
have defined it and is thus a logical vicious circle. Further, in the 
absence of definite criteria for mutual symmetry, the only way of 
decision in a particular problem rests plainly on intuition. Now 
intuitions of different persons cannot be forced to be unique, and 
consequently there existed a lot of controversies among mathe- 
maticians of the classical school, It was found that, even in slightly 
complicated problems of games of chance, it becomes really difficult 
to decide, even in a practical way, if the event points are mutually 
symmetrical or not, and different mathematicians gave different 
answers to many such problems. 


The above dfficulties will become apparent if we consider the 
simple experiment of throw with a die. Let us try to find the 
conditions under which the six points of this event space would be 
mutually symmetric, To start with, we would naturally demand 
that the die should have a perfectly regular cubical shape and 
should be made from perfectly homogeneous material. But are these 
conditions sufficient to ensure mutual symmetry of the event points ? 


2.2] CLASSICAL DEFINITION 15 


In reply to this, we would possibly add, for fear of incompleteness, 
that the six faces of the die must be symmetrical with respect to all 
possible kinetic properties, e.g. the centre of gravity of the cube 
should coincide with its geometrical centre, the twelve moments 
of inertia corresponding to the rotations of the cube about its 
twelve edges must all be equal and so on. Although it is difficult, 
if not impossible, to count on fingers all the kinetic properties 
exhaustively, we can still ask the pertinent question if geometrical and 
kinetic symmetries are sufficient for the purpose and if the principles 
of mechanics are the only guiding factors for the result of a throw 
with á die. It is well-known that the veteran gamblers believe in what 
is called the luck factor, and to them it would seem quite reasonable 
that the six different numbers inscribed on the six faces of an otherwise 
symmetrical die may yet produce difference in luck ! Humour apart, 
there is still another serious point to consider. It must not be forgotte) 
that talking about the dic only does not tell the whole story about the 
random experiment, which also includes the process of throwing the 
die from the box. This, as we have already remarked, is a very intricate 
and uncertain mechanical process, and, as such, it would be indeed 
impossible to find the conditions of symmetry of the event points 
for the process of throwing the die. All these arguments sufi- 
ciently convince anyone that it is impossible to find appropriate criteria ` 
for mutual symmetry, and if we still want to stick stubbornly to 
the idea of mutual symmetry, we are ultimately obliged to assume 
that the event points should be symmetrical from the point of view 
of probability itself, i.e. the phrase equally likely becomes synonymous 
with equally probable. This is a great weakness of the classical 
definition, and no sound mathematical theory can be hoped to be built 
on such a weak definition. 


2. Moreover, the classical theory has a very narrow compass of 
applications ; it is restricted to a small class of event spaces which 
contain only a finite number of so-called equally likely event points. 
With the help of such a theory, it will thus be impossible to 
treat the cases of unsymmetrical event points, e.g. the case of a 
loaded die or the sex of a new-born baby in which the two event 
points ‘boy’ and ‘girl’ connot be assumed to be necessarily equally 
likely and the cases of infinite number of event points, e.g. measure- 


` 


16 HISTORICAL BACKGROUND [2.3 


ment of а physical quantity and so on. With the development 
of the theories of statistics and other prospective fields of application 
of probability, it was observed that the restricted classical event spaces 
exist almost nowhere outside the relatively unimportant domain of the 
games of chance, and the classical theory is utterly powerless to cope 
with the new requirements. 


2.3 STATISTICAL. REGULARITY AND THE FREQUENCY 
DEFINITION OF PROBABILITY . 


It thus became clear that the classical theory must be abandoned 
altogether, and the new concept of probability must spring from an 
entirely new premise. Leaving aside intuition, the layman's second 
guiding factor is experience. Let us see what our experience Says 
about random experiments. If a coin is to be tossed once, nothing can 
be predicted about the result, Simply because the experiment is 
random. But if the coin is tossed a large number of times under 
identical or uniform conditions, it is very interesting to note that 
we will be able to tell much about the overall results from experience. 
For example, if the coin is tossed 200 times we can say that the 
event point ‘head’ will occur about 100 times or, in other words, the 
ratio of the number of times *head' occurs to the total number of 
experiments will be approximately 4. If the sequence is made longer, 
say 2000 times, we can safely predict that the above ratio will be 
very close to 3 and so on. 
1 In general we have the following empirical fact. Let a random 
experiment Е be repeated under identical or uniform conditions N 
times, ‘and if an event A connected with E is found to occur N(A) 
times, then N(A) will be called the absolute frequency or simply the 
frequency of A and the ratio N(A)/N the relative frequency or the 
frequency ratio of A and denoted by f(A), i.e. 


fiA)= No (2.3.1) 


It is observed that as N becomes larger and larger, the frequency 
ratio f(A) gradually tends to become more or less constant. This 
phenomenon of stability of frequency ratios for long sequences of 
repetitions of a random experiment is called statistical regularity. 
This may seem very surprising to be justified logically, in view of the 


2,3] FREQUENCY DEFINITION 17 


fact that every repetition of the experiment is independent of all 
other repetitions of the sequence and is quite open to result in any 
event point. But this was confirmed by many accurate and laborious 
experiments, which firmly established statistical regularity as an 
. observational fact. 

The new school utilised this phenomenon in formulating a 
definition of probability. In this theory, we postulate that the 
sequence f(A) = N(4)/N tends to a definite limit as № tends to infinity, 
and this limit will be called the probability of the event А to be 
denoted by P(A), i.e. 

P(A) = lim KA) (2.3.2) 


Remarks 

i. In this theory we require that the random experiments must 
be such that they can be repeated an indefinitely large number of 
times under identical conditions. It may be remarked that this 
imposes only a mild restriction on the random experiments, which is 
satisfied in almost all problems of practical importance. : 

9. This definition is, however, broad enough to include unequally 
likely as well as infinite number of event points. 


3. But the real strength and beauty of the theory lies in the 
fact that a definite operational meaning has been ascribed to 
probability, viz. for a long sequence of repetitions of a random 
experiment, the frequency ratio of a given event will be approximately 
equal to its probability, i.e. the number of times the event will occur 
is approximately equal to its probability times the total number of 
experiments. And this is the sense in which we can make use of our 
knowledge of probability. In contrast to this, the classical definition 
only intuitively satisfies our feeling for the word chance or probability. ` 


Deduction of some important rules 
1. For, any event A, 0<N(A)< N or dividing Бу N, 
0cf(A)- 1. In the limit as, Nh— œ 
0= Р(4) < 1 (2.3.3) 
2. The frequency of the certain event, N(S) = № or /(5) = 1 so that 
in the limit 
P(S)=1 (2.3.4) 


18 HISTORICAL BACKGROUND [2.3 


' The frequency of the impossible event, N(O) = 0 or /(O) = 0. Hence 
Р(0)=0 (2.3.5) 


3. Let A and B be any two mutually exclusive events, і.е. AB = О. 
Clearly 


М(А + В)= М(А) + N(B) 
a NAA B), MA), NB) 
N N N 
> ДА + B) = f(A) +f(B) 
Making М œ е 


P(A + B)= P(A) + P(B) (2.3.6) 

If A, B, C be pairwise mutually exclusive, i.e. AB-O, BC=O, 

CA - O, then the events A and B+C are also mutually exclusive, for 
A(B+ C) - AB + AC - O, and we have 


P(A B+ C) - P(A) + P(B+C)=P(A)+ P(B) + P(C) 


In general, if A,, Ao,..... A, ben pairwise mutually exclusive 
events, i.e. A,A;=O (ij; i, j=1, 2,..... › п), we have the following 


addition rule : " 
P(A, + Ao t AL) P(A,)+P(A,) + -=+ P(A,) (2.3.7) 
Conditional probability 


Consider two events A and B. Let us make the hypothesis that 
the event 4 has occurred. Then in the sequence of N repetitions of 
the random experiment E, we have to consider only a subsequence of 
N(A) repetitions in which A has occurred, and among these N(A) 
repetitions the number of times the event B also occurs (along with 
A)is N(AB). The ratio N(AB)/N(A) will be called the conditional 


frequency ratio of B on the hypothesis that A has occurred and denoted 
by f(B| A), i.e. 


ВІА) = ae 23.8) 


We assume that lim f(B|A) exists, and this limit is called the 
л; Noo 


conditional probability of B on the hypothesis that A has occurred, to 
be denoted by P(B| A). That is, 


P(B|A)- Jim ABIA) (2.3.9) 


2.8] FREQUENCY DEFINITION 19 


Now 
N(AB) /N(A)_f(AB 
J(B\ A)= AD ED IAD 


As N— œ we get 


P(B|A)= E 
provided P(A) #0, 
Similarly 
P(A|B)= d 


provided P(B) 40. 
Hence, if P(A), P(B) 0, we have the multiplication rule : 
P(AB) = P(A)P(B| A) = P(B)P(A|B) (2.3.10) 


Criticisms of the new theory 


Although there is not much objection against the logical content 
of this theory, there is some inherent weakness or inelegance in the 
mathematical formalism. In this definition, we note that the frequency 
ratio is thoroughly an empirical concept, whereas the limit is 
postulated in a rigorous analytical sense. This combination of 
empirical and theoretical concepts is very inelegant and naturally 
leads to mathematical difficulties. Now this problem is not typical of 
probability theory only but arises in other branches of mathematics as 
well, e.g. theories of geometry. In geometry, we face the same 
difficult situation if we try to define the fundamental entities like a 
point, a straight line еіс. We may attempt to define a point as the 
limit of a sequence of chalk dots drawn on the blackboard of gradually 
decreasing dimensions, which will be somewhat similar to the above 
definition of probability. This is, however, not done in modern 
theories of geometry, in which point, straight line etc. remain 
undefined concepts, and we start with a system of axioms which 
specify the fundamental relations among them. Mathematicians, as 
we know, are habitually reluctant to accept things as new and, as 
such, would always try to define apparently new things in terms of 
things already known. But suppose if a concept is radically new 
and can in no way be explained in terms of old ideas, then the 


20 HISTORICAL BACKGROUND [2.3 


question of a formal definition becomes meaningless. In the theory 
of probability also, we are ultimately forced to give up the hope of 
defining probability and take recourse to an axiomatic theory in 
which probability is accepted as an undefined new concept, and only 
the salient rules for calculation of probabilities are postulated. These 
rules will, however, be chosen from the previous theories with 


necessary modifications for operational convenience. And for this, 
we go over to the next chapter. 


— Nr o 
Date ...12.7 e Or E | 2- сат 
Aoc. БОП 


CHAPTER 3 
FUNDAMENTAL AXIOMS 


3.1 AXIOMS OF MATHEMATICAL PROBABILITY 


Let Е be a random experiment described by the event space Sand A 
be any event connected with Е, ie. A C S. The probability of A is a 
number associated with A, to be denoted by P(A ; E) or simply P(A), 
such that the following axioms are satisfied : 


I. P(A) >0 
II. The probability of a certain event, P(S)=1. 
Ш. If Ay, Ag; A,...... be a finite or infinite sequence of pairwise 
mutually exclusive evnts, i.e. 4;4;= 0 (і; i, 7=1, 2, 3,...... ), then 
P(A, +AgtAgtorr ) = P(A.) + P(Ag) + P(Ag) +: 


This axiom is obviously the formula (2.3.7) of the previous theory 
With an important extension to an infinite sequence of events and is 
called the axiom of complete additivity. 

Frequency interpretation. Now starting from the above axioms, 
we can logically build: up the mathematical structure of the theory of 
probability. But in order that such a theory may also be meaningful 
from the point of view of practical applications, we must have to 
postulate the basic rule for connecting the ideal numbers probabilities 
with experience. This rule, not included in the axioms, consists in the 
following frequency interpretation (not frequency definition!) of 
probability. 

It the random experiment Е is repeated a large number of times 
under identical or uniform conditions, the frequency ratio of any event 
will be approximately equal to its probability, i.e. P(A) = f(A), so that 
ДА) can be taken to be an experimentally measured value of the 
idealised number P(A), and longer is the sequence of repetitions of 
Е more accurate is the measured value. 


Remark. In view of the frequency interpretation, we are still 


5\9. y 


май 


€ UP 


22, FUNDAMENTAL AXIOMS [3.1 


restricted to the class of random experiments which can be repeated a 
large number of times, at least conceptually, under uniform conditions. 


Simple deductions 
1. From (1.3.1) 4+4=5, AA=O. Hence 
1 =P(S) = P(A + A) = P(A) + P(A) 


` 


ог 
P(A) -1- P(A) (3.1.1) 
2. Sine — $-0, P(0)- P(S)-1- P(S)-0, or 
P(0)-0 (3.1.2) 


i.e. the probability of an impossible event is zero. 


If, however, P(A) = 0, we cannot conclude A — О or A is impossible ; 
in this case, we say that А is stochastically impossible. (The word 
stochastic means pertaining to probability.) 


Similarly, if P(A) = 1, we say that A is stochastically certain. 
3. P(A)=1—P(A) and since P(A) > 0, we have 
P(A) <1 (3.1.3) 


4, Let ACB, Then B=A+(B-A), where A and B-A are 
mutually exclusive, so that 


P(B)=P(A)+P(B-A) . J 
or 
P(B - A) = P(B) - P(A) (3.1.4) 
Further, since P(B - 4) > 0 
P(A) < P(B) (3.1.5) 


5. CLASSICAL DEFINITION. Let the event Space S contain the 
n points U,, U,,...... U,. Then 


Sie ap Upon. UF 


Since any two event points are necessarily mutually exclusive, 
UU ;=0 (ij), and so 


1= P(S) = Р) + P(U )) + +--+ P(U,) 
or 


P(U )) +Р(О,) +... *P(U,)-1 


8.1] AXIOMS OF PROBABILITY 23 


If the event points are assumed to be equally probable, we have 


PU,) =P(U 2) lite aa =P(U,) 
={FU,)+ PU.) + -2 + P(U,)i/n=1/n 
If now any event A contains m event points, say, U,, Us,...... Um 
then 


So 
P(A) = P(U4) + P(U s) +- + P(U n) = т/п 
For clarity, writing m(A) in place of m, we get the classical formula 


m(A) 


P(A) = 


(3.1.6) 


6. GENERAL ADDITION RULE. We shall now extend the addition 
tule to events which may not be, in general, mutually exclusive. 
Consider any two events А and B. The events А- AB, AB and B- AB 
are always pairwise mutually exclusive, and we have 


A=(A-AB)+AB, B=AB+(B- AB) 
and 
A+ B=(A- AB) + AB +(В- АВ) 
Then 
P(A) = P(A- АВ) + P(AB), Р(В) = Р(АВ) + P(B - AB) 
Р(А + B) = P(A - АВ) + P(AB) + P(B - AB) 
Eliminating P(A — AB) and P(B - AB) from the above equations, we get 
the general addition rule : 
P(A +B) = P(A) + P(B) - P(AB) (3.1.7) 
For three events A, B, C 
P(A +B+C)=P(A) + P(B + C) - Р{А(В + С)} 
= P(A) + P(B) + P(C) - P(BC) - Р(АВ + AC) 
= P(A) + P(B) + P(C) - P(BC) - P(AB) - P(AC) 
+P(AB. AC) 
Noting that AB.AC = AABC = ABC, we have 
P(A+B+C)=P(A) + P(B) + Р(С)- P(BC) - F(CA) 
— P(AB) + P(ABC) (3.1.8) 


24 FUNDAMENTAL AXIOMS [3.1 


Generalising for п events 
P(A, + Ag+ +++ + An) = P(A,)  P(Az) +++:  P(A,) 
= P(A,A_)- P(A41A43):-- —P(An-14n) 
*P(A.AsA5,) - P(A145A4,) +: +Р(А„—» An-1 Aq) 


seth (= 1)" PAL, An) (3.1.9) 
-7. If {An} is a monotonic sequence of events, then 
P(lim A,)=lim P(A,) (3.1.10) 


Proof. First assume that {A,} is monotonic non-decreasing or 
expanding, for which 


сс 
lim A,=> A, 
n=l 
Setting 
Bi=A,, By=An- 444 (n > 2) 
{B,} is a sequence of pairwise mutually exclusive events such that 
9 A= > B, 
n=1 n=1 
Also for every n 


An => B; 


izl 


. Using Axiom III we have 


P(lim A,)- PČ B,) = > P(B,) 


-lim Ў P(B) - lim PC В) - lim P(4,) 
i=l izl 


Next consider the case in which {4,} is a monotonic non-increasing 
or contracting sequence of events. Then we know that {A,,} is expand- 
ing so that by the above result 

P(lim A,)=lim P(A,) 
But by (1.3.7) 
P (lim 4,) = P(lim A) = 1— P(lim A,) 


and lim P(A,) = lim (1— P(Ay)}= 1 - lim P(4,) whence the result (3.1.10) 
follows, 


3.1] AXIOMS OF PROBABILITY 25 
Examples 


In the following examples we assume that any event space in question is classical 


in nature, i.e. contains a finite number of event points, all of which are equally pro- 
bable, unless stated otherwise. 


1. A coin is tossed 3 times in succession. Find the Probability of (a) 2 heads, 
(b) 2 consecutive heads. 


We have already discussed the event space of thi 
Ex. 1, Sec. 1.4, The tot 


event ‘2 heads’, 
(3.1.6) 


s random experiment in 
al number of points in the Space, n-8. Let A denote the 
Then 4 contains 3 event points, viz. U,, U,,U,,ie. m(A)=3, By 


P(A)= m(A)/n= 3/8 


Let B be the event <2 consecutive heads’. 


B consists of the 2 points U, and 
U, so that m(B)- 2, Hence 
P(B)-2]85 1/4 

2. Two dice are thrown. Find the Probability that the sum of the faces 
equals or exceeds 10, 

Here п= 36. Let А, В, С denote the events ‘sum 10’, ‘sum 11’ and ‘sum 12° 
Tespectively, Then A+B+C is the required event, the probability of which is to 
be computed, 


Now the 


event A contains the 3 
the 2 points (5, 


Points (4, 6), (5, 5), (6, 4), the event B 
6), (6, 5) and C the only point (6, 6). Then 
m(A)-3, m(B)=2, m(C)-1 
P(A) = 3/36, P(B) = 2/36, P(C)=1/36 
Since 4, B, C are pairwise mutually exclusive, we have by Axiom III 
P(A+B+C)=P(A)+P(B)+P(C)=1/6 
З. A die is rolled. If th 


e result is either an even face or a multiple of three, 
I win, What is the Probability of my winning ? 
Let A-even face, B—multiple of 
three, 


The event space $ and the events 
4, are represented by the adjoining 
Ш (Fig. 2). А and B are not 
lly exclusive, but contain one 
in common, viz, 6. Clearly then, 

156, m(A)=3, m(B)-2, m(AB)-1 
o 


diagra, 
mutua 
Point 


P(4)=3/6, P(B)=2/6, P(AB)=1)6 = 
ence, by (3.1.7), the probability . 


of my Win= P(4+B)=P(A)+P(B)—P(AB)=2/3, 


26 FUNDAMENTAL AXIOMS [3.1 


Another method. We may also directly count the number of points contained 
їп the event ‘either an even face or a multiple of three’. It is 4, and hence the 
required probability is 4/6— 2/3. 


4. Adie is thrown k times in succession. Find the probability of obtaining 
six at least once. é 


A typical event point of this space may be represented by a succession of 
k integers ranging from 1 to 6, say, (5, 3, 1,...4). 


The total number of points in the event space is obviously the number of ways 
in which k places can be filled by 6 different things, repetitions being allowed, and 
hence n= 6*. 

Let A denote the event ‘at least one six’. In this case, it will be easier to 
calculate the probability of the complementary event A which is ‘no six’. By 
the same reasoning as above А will contain 5* points, or 


m(-5, P(A)= (516% 
By (3.1.1) 


P(A)=1-(5/6¥ 


A card is drawn at random from each of two well-shuffled packs of cards. 
What is the probability that at least one of them is queen of spades ? 


5. 


Let A—the first card is queen of spades, and B—the second card is queen of 
spades. Here п= 52°. 

More precisely, A represents the event ‘the first card is queen of spades and 
the second anything’, so that m(4) -52. Similarly, т(В) = 52. 

Therefore 

P(A) = P(B) = 52/52? =1/52 

The event AB contains only one point, viz. ‘both cards are queens of spades’ 
or m(AB)=1, and so Р(АВ) =1/52°. 

Now A4 B is the required event, and 


P(A+B)= P(A) + (B) - P(AB) = 354 h- gh, = 403, 

Remark. For calculating P(A), we may be tempted to consider the event 
space of the first draw only which contains a total of 52 points, of which one is 
contained in A so that P(A)=1/52. The answer is correct, but that would be 
running intutively ahead of logic, and the reader is advised to refrain from such 
intuitive short-cuts at the initial stage, and think clearly in terms of the exact 
event space of the random experiment in question. 


Another method. Let A denote the event ‘at least one card is queen of spades’. 
Then А is the event ‘none of them is queen of spades', and easily we get m(A) = 51°. 
Непсе 


ко Pode (3-4 


341] AXIOMS OF PROBABILITY 21 


6. Ап urn contains № = №, + М» balls, of which №, are white and 
Nz black. (a) A ball is drawn at random from the urn. What is the 
probability that it is white? (b) If n balls are drawn, find the pro- 
bability that among these exactly i balls are white. 


(a) The balls of the same colour are assumed to be distinguishable 
among themselves, and hence when one ball is drawn the total number 
of points in the event space is N, of which N, are contained in the 
event ‘white ball’. Therefore, the probability of drawing a white ball 
is N,/N. Similarly, the probability of drawing a black ball is N/N 
=1-N,/N. 


(b) In this case an event point will be a (disordered) group of 


п balls, and the total number of event points is the number of different 
groups ofn balls that can be formed out of N different balls, which 


UN 
is ( “4 Ifamong the n balls drawn i balls are white, the remaining 


п – і balls must be black. Hence the event ‘i white balls’ will contain 


b. 1) (Ne х ; е" 
\ i} (n-i) event points, and as such its probability is 


Wale) Guan 


N 
fal 
Stirling’s formula. We know how laborious it is to calculate the 
factorials of large numbers, and often it is convenient to have a formula 
for computing the numerical values of large factorials approximately. 
This is Stirling's formula which states that 
nl= Jürg eoin  (0<0<1) (3.1.12) 


Since 9/12n — 0 as n > œ, we have for large values of n 
(3.1.13) 


y sce that for 


n ! = 2x п"°%ї!® e? 


This approximation formula is fairly accurate, as we ma 
7 — 10 the error is 0.8%, and for n = 100 it is only 0.08%. 
a bridge hand of 13 cards contains one ace ? 
lowing manner : A pack contains 52 cards, 


13 cards are drawn from the pack ; to find 
Thus we see that this 


7. Whatis the probability that 


Let us restate the problem in the fol 
Of which 4 are aces and 48 other cards; і 
the Probability that among these 13 cards one is an ace. 


28 FUNDAMENTAL AXIOMS [3.1 


problem fits in exactly with the model of Ех. 6, and by (3.1.11) the required 


probability is 
(0 (#3)/ (33) = 044 


8. The urn problem of Ex. 6 may be easily generalised to balls of 
more than two different colours. Let the urn contain N 2 N, Na 
+ +N,, balls, of which N, are of the first colour, Na of the second 


colour, ‚ and N,, of the mth colour. The probability of drawing a 
ball of the kth colour is then N4/N (k= 1, 2,...... т). 


Tfn=i,+igt...... im balls are drawn from the urn, the probability 
that of these i, are of the first colour, і, of the second colour, 
and im of the mth colour is, by arguments similar to those in Ex. 6 
Na Na)... | 


i, ] \ia te 


Malte imi (3.1.14) 
(2) 


9. Find the probability that a bridge hand will contain 5 spades, 4 hearts. 
3 diamonds and 1 club. 


By (3.1.14) the required probability is 
13\ (13V (13V (13) ,/52 
(5) 6 (3) (7) (8) = 0.0058 
10. From an urn containing N, white and Na black balls 
(М= N, + N ,), balls are successively drawn without replacement. What 
is the probability that i black balls will precede the first white ball ? 
Suppose all the balls are drawn one by one without replacement 
and arranged in N different rooms. The total number of event 
points is the number of ways is which N distinguishable balls can be 
arranged in N rooms, ie. N !. Now the required event means 
that the first i rooms are occupied by black balls, the (i+ 1)th room 


by a white ball, and the last № — i— 1 rooms are filled by the remaining 
N-i-1 balls in any manner whatsoever. 


So the required event 
contains 
NN, - 1)...... (Ne-i+1).N,.(N-i-1)! 
event points, and hence its probability is 


N,N.(N = 1) em N,-i 1 
NNT). Kcd 4437 (3.1.15) 


3.1] AXIOMS OF PROBABILITY 29 


Clearly, this result holds for; > 1. Fori —0, we may easily get by 
direct computation that the required probability, which is indeed the 
probability that the first drawing yields a white ball, is N,/N. 

11. When 7 dice are thrown, find the probability that the sum of 
the points on the dice has a prescribed value s. 

Clearly the event space contains 6" event points. The number of 
event points contained in the required event is the number of different 
sets of integers (x,, X25...... X4) such that 


where ху, X«,......X, can take values I Жие 6. But this number is 
again the coefficient of х° in the expansion of (x x* x? xta xs 
*x*). We have 


XX Rx x5 e x* ex(1- x9)/(1— 3) 


and 
a-ey- Sco (thee a-re Sis) 
t= = 
Hence | ' 


(W422 + еее +хбуз = ЕЗ * (= »() (07 es 


i20 j=0 
Ifn+6i+jes, j=s-6i-n and as j > 0, i < (s-n)/6, and hence the 
Coefficient of x* in the above expansion is 


.[n s-6i-1 ) 
Zea VS 
Where i ranges from 0 to the greatest integer < (s—1)/6. a ae 
also the number of event points the required event contains, its pro 


bability is 
da idee (3.1.16) 
o> (- 1) ( i ) ( n-1 
12. If т balls are placed at random into m given bs *bs 
Probability that the Ist cell contains r, balls, the 2nd cell rs А 
апа the nth cell m balls, where r4 t rat += Р. 4 L The 
The balls, being distinguishable, may be эш И a 5 г DE i 
total number of points in the event space is n”, for the 


30 FUNDAMENTAL AXIOMS [3.1 


placed in any one of the m cells, and the same is true for the 2nd, 
ards ta nth balls. The number of points the required event contains 
is same as the number of permutations of r distinguishable balls such 
that the first r, balls are placed in the Ist cell, the next rẹ balls in the 
2nd cell,...... and the last r, balls in the nth cell, but, at the same time, 
we ignore the order of the r, balls in the 1st cell, of the r, balls in the 
Ond cell,...... and ofthe r, balls in the nth cell. Hence the number of 
points contained in the required event is 


r! 
RED АНЕЛ ral 
and the required probability is 
tee rt 
Pss io. Tre | Gu) 


Indistinguishable balls. In the next two examples we shall 
consider random distributions of indistinguishable balls into a number 
of cells which serve as important models in studying the behaviour of 
assemblages in small-particle physics, in which identical particles are 
represented by the indistinguishable balls and the given cells correspond 
to the various possible physical states of the particles. 


13. Ifr indistinguishable balls are placed at random into n different 


cells, find the probability that 1st cell contains r, balls, the 2nd cell 
rs balls,...... and the nth cell ғ, balls, where r, ++ traer. 


The event in question is a typical event point which may be represent- 
ed by the symbol 


(r, balls | rg balls | ...... | ra balls) 


* r,7 г and each r; can take values from 0 to r. 

To count how many event points are there in the space, we proceed 
as follows. In the above symbolical representation of the n cells we 
have n— 1 internal partitions and there are г balls, so that the total 
number of internal partitions and balls is n r-1. Mark the points 
1, 2,...n- r- 1 on the number axis. Any event point may be obtained 
by choosing п – 1 of these n+ r 1 marked points in which the partitions 


are inserted, the remaining r points being occupied by the r balls. 
Hence the total number of event points is 


(шен Eus ) 


where r, 4 ra e 


3.2] CONDITIONAL PROBABILITY 31 


We now give two possible solutions of the above problem. 


(а) In the first solution the event points 
equally probable but the probabil 
ity of the Same event 
the required Probability is 
most natural choic 
‘Maxwell-Boltzman 


are considered to be not 
ity of an event point is taken to be 
ifthe balls were distinguishable. 
given by (3.1.12, This is 


© and this model in physics 
n Statistics’, 


i.e. 
perhaps the 
is known as the 
(b) In the Second solution 


equally Probable so that the pr 
is an event Point, is 


all the event points 


are taken to be 
obability of the requir 


ed event, Which 
So a. ] v: 
[" н ) (3.1.18) 
In Physics this result goes b 


У the name of ‘Bose-Einstein statistics’, 


at in the above two different 
| he Problem, differe 
assigned to th 


5 of Probabilities have been 
© same event Space Biving rise to two different models, 
oth of whic are useful in Practice, 
n | Let , distinguishable balls be distributed at random to 
SAL Cells suc that а Cell is either empty or Occupied by а Single 
e 1e. асе Cannot Contain two Or more balls) Find the Probability 
the Is Cell containg гу balls, the 2nd cell "а balls, 
cell r, baj S, where p, 4 


T and the nth 
2 T [I е: + F 


ъ=г and each р, = O or |, 


ts is [s ) for this is obviously the 
Number f ways of à 
r bap; E YS о Selecting ,. Cells out of the п given Cells in Which the 
m Placed, the Test of the Cells being empty ow assuming 
Vent i ent Points аге equally Probable and Noting that the required 
4 typica] event Point, its Probability 
ni-a 
P i (3.1.19) 
1s assi n: : 
agis cela Probabilites IS known as the ‘Ferini- Dirge 
8; ў 
2 CONDITIONAL PROBABILITY 
The cond 


‘tional probability of an ey 


ШЕТТЕ 


30 FUNDAMENTAL AXIOMS [3.1 


placed in any one of the n cells, and the same is true for the 2nd, 
Srd; à nth balls. The number of points the required event contains 
is same as the number of permutations of r distinguishable balls such 
that the first r, balls are placed in the Ist cell, the next ғ, balls in the 
2nd cell,...... and the last г, balls in the nth cell, but, at the same time, 
we ignore the order of the r, balls in the Ist cell, of the г, balls in the 


2nd cell,...... and of the r, balls in the nth cell. Hence the number of 
points contained in the required event is 


Mile 


Foire Gaara 119 


Indistinguishable balls. In the next two examples we shall 
consider random distributions of indistinguishable balls into a number 
of cells which serve as important models in studying the behaviour of 
assemblages in small-particle physics, in which identical particles are 
represented by the indistinguishable balls and the given cells correspond 
to the various possible physical states of the particles. 


13. Ifr indistinguishable balls are placed at random into n different 
cells, find the probability that Ist cell contains r, balls, the 2nd cell 
rs balls,...... and the nth cell r, balls, where r, 4 r4 4 ----- trer. 


The event in question is a typical event point which may be represent- 
ed by the symbol 


(rı balls | rg balls | ...... | ra balls) 


* r„=r and each r; can take values from 0 to r. 


To count how many event points are there in the 
as follows. 


where r, + Pa +- 


space, we proceed 
In the above symbolical representation of the n cells we 


have л— 1 internal partitions and there are r balls, so that the total 
number of internal partitions and balls isn+r-1. 


1, 2,...п+т-1 on the number axis. Any event point may be obtained 


by choosing п – 1 of these п +r- 1 marked points in which the partitions 


are inserted, the remaining r points being occupied by the r balls. 
Hence the total number of event points is 


кт 


Mark the points 


3.2] CONDITIONAL PROBABILITY 31 
We now give two possible solutions of the above problem. 


(а) In the first solution the event points are considered to be not 
equally probable but the probability of an event point is taken to be 
the probability of the same event if the balls were distinguishable, i.e. 
the required probability is given by (3.1.17). This is perhaps the 
most natural choice, and this model in physics is known as the 
*Maxwell-Boltzmann statistics'. 


(b) In the second solution all the event points are taken to be 
equally probable so that the probability of the required event, which 
is an event point, is 

ғ ) ` (3.1.18) 
In physics this result goes by the name of ‘Bose-Einstein statistics’. 


Remark. It is interesting to note that in the above two different 
Solutions of the problem, different sets of probabilities have been 
assigned to the same event space giving rise to two different models, 
both of which are useful in practice. 

14. Let r indistinguishable balls be distributed at random to. 
n( > r) cells such that a cell is either empty or occupied by a single 
ball (i.e. a cell cannot contain two or more balls). Find the probability 
that the Ist cell contains r, balls, the 2nd cell rg balls,...... and the nth 
cell r, balls, where г, * ra + == 4 r,7r and each г; = 0 or l. 


The total number of event points is ( Я ) for this is obviously the 


number of ways of selecting r cells out of the n given cells in which the 
r balls are placed, the rest of the cells being empty. Now assuming 
that the event points are equally probable and noting that the required 
event is a typical event point, its probability is 

ird (3.1.19) 

r 

This assignment of probabilites is known as the ‘Fermi-Dirac 
statistics' in physics. 


3.2 CONDITIONAL PROBABILITY 
The conditional probability of am event B on the hypothesis that 


32 FUNDAMENTAL AXIOMS [3.2 


another event A has occurred will be denoted by P(B| 4) and defined by 


P(AB) 
P 26.21) 


provided P(A) #0. " 


Р(В|4)= 


In case Р(4)=0, the conditional probability Р(В| 4) remains 
undefined. 


Interpretation.: The interpretation of these newly defined numbers— 
conditional probabilities will naturally be as follows. For a long 
sequence of repetitions of the random experiment E under uniform 
conditions, the conditional frequency ratio f(B|A) is taken to be 
an approximate value of the conditional probability Р(В| А). 


Similarly, by definition 
P(A|B) = ИВ) 


provided P(B)#0. Hence, if P(A), P(B) ¥ 0, we have 
P(AB) = P(A)P(B| A) = P(B)P(A|B) (3.2.2) 


In case the conditional probabilities can be directly computed from 
the conditions of the experiment, (3.2.2) gives us a formula for 


calculating the probability of the product of two events and hence is 
often called the multiplication rule. 


Remark. It may seem somewhat paradoxical when we say 
that the same equation (3.2.1) or (3.2.2) is used to define the 
conditional probabilities and as a multiplication rule. The con- 
ditional probabilities are certainly new things in our theory and need 
be defined, and the latter statement of a multiplication rule is made 
only in a practical sense. What we mean is that, ifthe conditional 
probabilities can be determined in a practical manner by methods 
other than using the definition itself, then the probability of the joint 
occurrence of two events may be calculated by formula (3.2.2). This 
situation is indeed slightly unhappy but has big parallels in other 
branches of mathematics as well. In mechanics, we remember, the 
law— force — mass x acceleration is primarily used to measure force 


but. also serves as an equation of motion if the force can be measured 
by indirect means. 


8.2] CONDITIONAL PROBABILITY 33 


Generalisation. For three events А, B, C, we shall have 


P(ABC) = P(A) P(B| A) P(C|AB) | (3.2.3) 
Proof. К.Н. 8. "PE (А80) = Р(АВС) 


In general, for п events the multiplication rule is 
P(A, As... A5) =P(A1)P(Ae|Ax)P(A51414,).. P(An| AAs... Anas) 
(3.2.4) 
1. P(AB|A)=P(B A) 
2. P(B+C|A)=P(B|A) + P(C|A) - P(BC|A) 
If ABC - 0, 
P(B + С|А)= P(B|A)* P(C|A) 
3. If the event space S contains п equally probable event points, 
then P(A)=m(A)/n, P(AB)-m(AB)/n. Hence by (3.2.1) 


m(AB 
P(B|4)- та) (3.2.5) 

Thus when we make the hypothesis that the event A has occurred, 
we are, so to say, restricted toa new event space A. The portion 
of B which is left in the new space is evidently AB which contains 
m(AB) points, while the total number of points in this space is m(A), 
so that formula (3.2.5) only expresses the classical rule in a modified 


form. 


Examples 
1. In Ex. 3, Sec, 3.1 compute the conditional probabilities P(B| A) and P(A|B), 


where А and B denote the events ‘even face’ and ‘multiple of three’ respectively. 
We have already found 
n=6, m(A) =3, m(B) =2, m(AB)-1 


Hence 
P(A) = 3/6= 1/2, P(B)=2/6=1/3, P(AB) 21/6 


By definition 
P(B| A) = P(AB)/P(A)=1/3 
P(A|B)- P(AB)/P(B)=1/2 
We may also calculate more easily by (3.2.5). 
P(B| A) = m(AB)!m(A)=1)3, P(4|B)  m(/AB)Im(B)- 1/2 


34 FUNDAMENTAL AXIOMS [3.2 


2. Two cards are drawn successively from a pack without replacing the first. 
If the first card is a spade, find the probability that the second card is also a spade. 


If A—first card is a spade, B—second card is a spade, then 4B—both cards 
are spades. Thus 
m(A)-13x51, — m(AB)=13x12 
and 


m(AB) 13x12 4 
ВА) m(A) 13х51 17 


We may also arrive at the same result by the following practical mode of 
reasoning, When the first card is seen to be a spade, there remain in the pack 51 


cards, of which 12 are spades. Hence the probability that the second card is also a 
spade is 12/51 =4/17. 


Ifnow we feel that we have learnt how to calculate the conditional proba- 


bilities directly, we may attempt to solve the next problem by using the multi- 
Plication rule, 


8. In Ex, 2 find the probability that both cards are spades, 
As before 
P(A) =13/52=1/4, P(B|A)=4/17 
Hence 


P(AB)=P(A)P(B|A)=1/17 


The result may be verified by direct computation by formula (3.1.6). 


4. PoLvA'S URN PROBLEM. From an urn containing r red 
and b black balls, п balls are successively drawn such that the ball 
drawn is always replaced and, in addition, c balls of the colour drawn 


are added to the urn. Find the probability ofa complete run of 
n black balls. 


Let A; denote the event ‘ith ball is black’ (=1, 2, 
the required event is 4, 4,...4,. Clearly 
b 


P4275; 


— n). Then 


Now make the hypothesis that the event A, has occurred, i.e. the 


Ist ball is black, so that the urn now contains r red and b+c black 
balls. Hence the conditional probability 


РАА) 2+0. 


3.2] CONDITIONAL PROBABILITY 35 


Similarly, it follows that 


b+2 
PASA узг 


b =] 
| P(As A, A An) eR te 
By (3.2.4) 


A= ЫЬ + c)(b + 2с)...[Ь + (n – 1)c] 
"(кут + Ь + с)(т + Б + 2с)...[г + Б + (п— 1)с] 


P(A, As... 


5. MATCH OR RENCONTRE PROBLEM. From an urn containing 
n tickets numbered 1, 2,...п, tickets are drawn successively without 
replacement. If the kth ticket appears at the kth drawing, then we 
have a match or rencontre. Find the probabilities of (a) at leastone 
match, (b) no match at all and (c) exactly i matches. 

Let the tickets drawn be arranged in n different rooms. The 
total number of event points is the number of ways in which z tickets 
can be arranged in n rooms, which is n !. 

Let the event Ay be ‘match at the kth drawing’ (k=1, 2,...n). Let 
us first calculate P(4,4,...4;). The number of event points which 
A,45...Ay contains is (n — k) !, since the rooms no. 1, 2,...k are filled 
by tickets no. 1, 2,...k respectively so that the remaining n-k rooms 
can be filled by the remaining n—k tickets іп (п – К)! ways. Hence 

P(434s...45) = (n-k)! п! 


(a) By (3.1.9) and using symmetry, the probability of at least one 
match 


P(A, +Ag+An) 
5n P4) - (5 (4:4 + (= D0 PA Aan) 


е St ges: Jens 4s. 2 
k-l 


- DDH! 
k=l 


| 36 FUNDAMENTAL AXIOMS [3.2 


(b) Probability of no match at all is 
L-P(4, As As) >, (= Dk ! 
, 0=0 
(c) From symmetry, the probability of exactly i matches- 


(") P(Ai4s...4; B), where B is the event—‘no match in the last n-i 


drawings’. Now 


P(445...A; В)= Р(А,А„...А;) P(B| A,A,...4)) 


est 
=O POIL AL As A) 
For calculating the conditional probability P(B|A,45...4;), we 
note that if the event A, 4,...4; has occurred, then п — i tickets are left 
in the urn, viz. tickets no. i+ l, i*2,..n and drawings no. i4 ls 


i+2,... are yet to be made. Hence by replacing п by n — i in the result 
of case (5) 


P(BIAsAs...A,)= >" (кі 


k=0 
Therefore the probability of exactly i matches is 
А n-i | 
A Б (= D'k ! (3.2.6) 
k=0 


Theorem. If A,, Ag,......4, Беа given set of n pairwise mutually 
exclusive events, one of which certainly occurs, ie. 4i4;-O (ғ ; 


bj-1,2..n) and А, A44... Aye S, then for any arbitrary event X 
Р(Х) = P(As)P(X| As)  PCA )POG| 4: +... P(A)PQO A). (3.2.7) 
and (Bayes’ theorem) if P(x) 40 
P(4;|X) = P(A.) Р(Х|А,) 
P(A )PCTA,) + PA PO As) + — PCAs) PORT As) 
G 1, 2,...n) (3.2.8) 


Proof. We have 


X=SX=(4, +Agt.., +А„)Х=А,Х +А„Х+...+А„Х 


n 


3.2] CONDITIONAL PROBABILITY 37 


Since (4,Y(4;X) 2 4,4,3Y-OX-O, (iji i, ј=1, 2..n, А.Х 
AX EE A,X are pairwise mutually exclusive events, and hence 
P(X) = P(A,X) + P(A.X) +... + Р(А„Х) 
Since P(A;X) = P(A;) Р(Х | А), the result (3.2.7) follows. Again 
P(A;X) = P(X) F(4;|X) 
Hence if P(X) #0 


P(A:|X) = 
Then using (3.2.7), (3.2.8) is proved. 


Remarks. 


1. Formula (3.2.7) is useful only when the conditional probabilities 
P(X|A,), P(X|Ag),......P(X|A,) can be more easily obtained than a 
direct computation of P(X). 


2, Formula (3.2.8) is known as Bayes’ theorem. If we fancy to 
call the events A,, Ag,...... An causes of any event, then one of these 
causes necessarily acts, and if any event X occurs, it must be due to 
one of these causes, which is the reading of the equation XY —4,X 
+AgX+...... +A,X. Now ifthe probabilities of the causes P(A,), 
РО), ае P(A,) are known, and the probabilities of the occurrence of 
X on the hypotheses that the different causes are acting, viz. Р(Х |A,), 
P(X|4A,),...... P(X|A,) can be calculated, then on the knowledge that 
the event Y has occurred, Bayes' theorem provides the rule for calculat- 
ing the probability that a particular cause 4; were acting, i.e. P(A,| X). 
Phrased this way, Bayes' theorem appears to be deceptively meaningful, 
and in old days mathematicians tried to discover many philosophical 
secrets with the help of this theorem (certainly not making very correct 
use of it!). In modern thinking, however, all this is nonsense, and 
Bayes' theorem has no deeper meaning than what is mathematically 
stated above. 2 


3. The above formule (3.2.7) and (3.2.8) may be formally extended, 
without difficulty, to an infinite sequence of causes, {An}. 


Examples 
6. There are three identical urns containing white and black balls. The first 


urn contains 2 white and 3 black balls, the second urn 3 white and 5 black balls, 


38 FUNDAMENTAL AXIOMS [3.2 


and the third urn 5 white and 2 black balls. An urn is chosen at random, and a 
ball is drawn from it. Ifthe ball drawn is white, what is the probabitity that the 
second urn is chosen ? 


Let А; denote the event ‘the ball is from the ith urn’ (i=1, 2,3). The events 
A,, A,, A, are pairwise mutually exclusive, and one of these necessarily occurs. 
We note, the event 4; may be alternatively titled as ‘the ith urn is chosen’ when, 
from symmetry, we can write immediately P(A,)=P(A,)=P(A,)=1/3. 

Let X denote the event ‘white ball’. Then easily we find 

P(X|A,)=2/5, P(X|A,)=3/8, Р(Х|4,)= 5/7 
By (3.2.7) 
P(X) =P(A,)P(X|A,)+ PCA ) PQX | As) + PCA )PCX | 4,) 
—bitbkitbé-158 
Then by (3.2.8) 


PA, | x)= ААТА) „з _ м; 
7. LAPLACE'S URN PROBLEM. There are (N+ 1) identical urns 


marked 0, 1, 2,...... N, each of which contains N white and black balls. 
The ith urn contains i black and N — i white balls (i =0, L..N) An 


urn is chosen at random, and n random drawings are made from it, 
the ball drawn being always replaced. If all the п balls turn out to be 


black, what is the probability that the next ball drawn will also be 
black ? 


Let A; denote the event ‘the ith urn is chosen’. Then the events 
— Ay are pairwise mutually exclusive, one of which certainly 


occurs, From symmetry, P(4)-1,(N +1). Let X be the event ‘all 
n balls are black’. 


By reasoning similar to that in Ex. 4, P(X 14) =", and by (3.2.7) 


r= Srani- Уу 
i-0 1 


If Y-the (п+ 1) һ ball is black, then YY — all the n+1 balls are 
black. Replacing n by п+ 1 in the above result 


A jani 
кту SU)" 


3.3] STOCHASTIC INDEPENDENCE 39 


Hence the required conditional probability is 


The result assumes a simple interesting form if N is very large. 
In that case 


P(XY) = Р(Ү|Х) = 13 


n+? 
This is called the law of succession of Laplace. i 


3.3 STO CHASTIC INDEPENDENCE 


If P(B| A) = Р(В), we may say that the information that the event А has 
occurred does not affect the probability of the event B, and B is then 
said to be stochastically independent of A. It follows ftom (3.2.2) 
that if P(B|A)=P(B), then P(A|B)=P(A), i.e. A is stochastically 
independent of B and P(AB)=P(A)P(B), provided, of course, P(A), 
P(B)z0. Also the last equation implies both P(B|A)=P(B) and 
P(A|B) = P(A). 


Thus we define two events A and B to be stochastically independent 
or simply independent if 
P(AB) = Р(А)Р(В) (3.3.1) 


We agree to accept this definition even if P(A) or P(B)=0. 


Formula (3.3.1) may be used аз а simple multiplication rule if we 
can judge a priori that the events 4 and B are independent. In a 
practical problem, two events may be taken to be stochastically 
independent if there is no causal relation between them, i.e. if they are 


causally independent. 


40 FUNDAMENTAL AXIOMS [3.8 


Generalisation. Take three events A, B, C which are pairwise 
independent, i.e. 
P(AB) = P(A)P(B), P(BC) = P(B)P(C) 
P(CA) = P(C)P(A) (3.3.2) 


Then we may be led to think that perhaps in this case it follows 
that the events A and BC etc. are also independent. That such a 
conjecture is false will be apparent from the following example. 


Example 1. Leta coin be tossed twice so that the event space consists of the 
four points (H, H). (H,T), (T, H) and (T, T). 


Define events A, B, C to be ‘head in the first toss’, ‘head in the second toss’ 
and ‘one head’ respectively. Then A contains the two points (H, H), (H, T), B 
the two points (H, H), (T, H) and C the two points (Н, T), (T, Н). AB contains 


the only point (Н, Н), BC the point (T, H) and CA the point (H, T), and the 
event ABC is impossible. Hence 


` P(A)=P(B)=P(C)=1)2 
P(AB) = P(BC)=P(CA)=1/4 
P(ABC)-0 
80 that conditions (3.3.2) are satisfied but P(ABC)#P(A) P(BC), i.e. А and BC are 
not independent. 

Now suppose that 4, B, C are pairwise independent events, and 
further that the events 4 and BC etc. are independent. Then (3.3.2) 
is satisfied and 

P(ABC) = P(A)P(BC) 
or by (3.3.2) 


P(ABC) = P(A)P(B)P(C) (3.3.3) 


Conversely, if the conditions (3.3.2) and (3.3.3) are both satisfied, 
then A, B, C are obviously pairwise independent, and 


P(ABC)- P(A)P(B)P(C) = P(A)P(BC) 
50 that A and BC are independent, 


г Similarly, it follows that В and 
CA are independent, as also C and AB. 


These considerations lead to the following enlargement of the concept 
of independence for three events, 


8.8] = STOCHASTIC INDEPENDENCE EN 


The events A, B, C are said to be mutually independent if both the 
conditions (3.3.2) and (3.3.3) are satisfied. 
In general, we define n events A,, Aa...... A, to be mutually indepen- 
dent if the following relations hold : 
P(A;A;) = P(A)P(A;) (i<j ; i, j апу combination 
of 1, 2,...n taken 2 at a time) 
P(A;A;Ar) = P(A)P(A)P(4;) (<j <k : i, j, k 
any combination of І, 2,...n taken 3 at a time) (3.3.4) * 


P(A,Ag...An) = P(A,)P(A,)...P(A4) 


Remark. We have already noted that pairwise independence does 
not imply mutual independence of the events. But events which are 
pairwise independent but not mutually so are rather artificial and 
rarely met in practice. 


Examples 
2, In Ex. 1, Sec 3.2 show that the events A and B are independent. 


We have P(A) = 1/2, P(B)=1/3, and P(AB)=1/6. Therefore (3.3.1) holds, and 
hence the conclusion. 


з. А сага is drawn from a pack. What is the probability that it is queen of 
spades ? 
Let A—queen, B—spade, then 4B—queen of spades. 


Since there is no causal connection between the events ‘queen’ and ‘spade’, we 
can take A and B to be independent, and make use of the multiplication rule 
(3.3.1). Now 


P(A) =4/52= 1/13, Р(В) = 13/52 =1/4 
P(AB) = P(A)P(B) = 1/52 

4. A coin is tossed, and a die is thrown, Show that the events ‘head’ and 
'six' are independent. 

The event space consists of 12 points, viz. (H, 1), (H,? )...(Н, 6), and (Т, 1) 
(T, 2),...(T, 6). If A—head, B—six, then AB—(H, 6) and 

P(A) = 6/12=1/2, P(B) -2/12- 1/6, P(AB)= 1/12 

Hence P(AB) = P(A)P(B), i.c. A and В are independent. 


42 FUNDAMENTAL AXIOMS - [34 


Remark. Since the die is thrown independent of the toss, it is natural to 
expect that the events ‘head’ and ‘six’ are independent. The independence of the 
toss and the throw of the die is, however, only meant in the intuitive sense, for we 
have not yet defined such independence of two random experiments. The only 


thing we have assumed in the above proof is that the 12 event points are equally 
probable, 


3.4 EXERCISES 


1. Prove the following relations : 


P(A+B) - 1— P(AB), P(AB)-1-P(A)- P(B) + P(AB) 
P(A+B)=1- P(A)+P(AB), P(AB)=P(B)-P(AB) 
2. Show that the Probability of occurrence of only one of the events A and 
B is P(A) + P(B) -2P(AB). 
8. Boole's Inequality, Prove that 
P(A, +A, n... +A,) < P(A,) * P(A,)4....... P(A,) 


4. A coin is tossed n times in Succession, Find the probability of r ( < n) 
heads. 


5. What is the probability of an odd sum when two dice are thrown ? 


6. Two cards are drawn from a well-shuffled pack. Find the probability that 
at least one of them is a spade. 

7. Two urns contain respectively 3 white, 7 red, 15 black balls, and 10 white, 
б red, 9 black balls. One ball is drawn from each urn. Find the probability that 
both the balls are of the same colour. 


8. An urn contains three balls numbered 1, 2 and 3. Two balls are drawn 
Successively, the first ball drawn being replaced. Find the probability that the 
Sum of the two numbers is 5, 

9. De Mere's Paradox. 


Show that the probability of Obtaining six at least 
once in 4 throws with a die is 


slightly greater than 4, and that of obtaining double 
Six at the least once in 24 throws with two dice is slightly less than à. 


10. Find the minimum number of times à die has to be thrown such that the 
probability of no six is less than 4, 
11. 


The numbers 1, 2,...п are arranged in random order, What is the proba- 
bility th 


at the numbers 1 and 2 are always together ? 


12. From the numbers 1, 2,...2n+1 three are chosen at random, Prove that 
the probability that these are in arithmetical Progression is 3n/(4n? — 1). 


18. A coin is tossed m+n times (mn) Show that the probability of exactly 
т Consecutive heads js (1 -3)/27*?, and that of at least m consecutive heads is 
(n2), 


14. From an urn containing n balls any number of balls are drawn. Show 
that the probability of drawing an even number of balls is Q*-! - 17-1). 


3.4] EXERCISES 43 


15. If an even number of cards are drawn from a full pack, find the proba- 
bility that these consist half of red and half of black. 


16. What is the probability that a bridge hand will contain at least one ace ? 


17. What is the probability that the combined bridge hands of ‘north’ and 
‘south’ contain all the 4 aces ? 

18. 100 prizes will be given in a lottery of 10,000 tickets. Find the minimum 
number of tickets a person has to buy in order that the probability of his winning 
at least one prize is greater than 3. 

19. Find the probability that a bridge hand contains all 13 face values. 

20. If four persons are selected at random from a group of 3 men, 2 women 
and 4 children, what is the probability that among these there are 1 man, 1 woman 
and 2 children ? 

21. What is the probability that a bridge hand contains 5 cards of some suit, 
4 of another, 3 of a third, and 1 of the last suit ? 

22. From ап urn containing n tickets numbered 1, 2,...п, r tickets are drawn 
simultaneously and arranged in increasing order of their numbers : x, «x, <... 
*x,. Show that the probability that x;=s is 


5-1 (n- sy (m 
(i-i) sey! (? 

23. An urn contains N, white and N, black balls. Two players 4 and B 
alternately draw a ball without replacement, and one who draws the first white 
ball wins the game. If A begins to draw, find the probability of his winning. 

24, If cards are successively drawn without replacement from a full pack, 


what is the probability that five cards will precede the first ace ? 

25. An urn contains N, white and N, black balls, from which k balls are 
drawn one by one without replacement and laid aside, their colour being unnoted. 
Then one more ball is drawn. Find the probability that it is white. 

26. From an urn containing N, white and N, black balls (N=N,+N,) , balls 
are successively drawn without replacement until only those of the same colour 
are left, Prove that the probability that the balls left are white is N,/N. 

?7. Find the probability of obtaining 14 with 3 dice, and show that it is the 
same with 5 dice. 

28. If r balls are distributed at random in n cells, prove that the probability p; 
that a given cell contains exactly i balls is given by 


pi () (n -1y-t/n 


Further show that the most probable number(s) i, of balls in a given cell is 


determined by the inequalities 
(r*1)n-1«& i, < (r+1)/n 


44 FUNDAMENTAL AXIOMS [3.4 


i.e. if (r+1)/n is not an integer, i,,=the greatest integer less than (r+1)/n, and 
if (r+1)/n is an integer, i,,=(r+1)/n—1 or (r+1)/n. 


29. If n objects are distributed at random among a men and b (« a) women. 
then show that the probability that the women get an odd number of objects is 
(а+ b) —(a—b)"}] (a+b). 


30. Let r indistinguishable particles be placed at random into n cells. If the 
particles obey *Bose-Einstein statistics', prove that the probability that there are 
exactly i particles in a given cell is 


ntr—i-2| /(n+r-1 
( Ei ) | ( r ) 
Show also that the most probable numoer of particles in a given cell is zero. 
provided n > 2, 


81. Ifr indistinguishable particles obeying ‘Fermi-Dirac statistics’ is placed 


at random into n cells, prove that the probability that a given cell is empty is 
1-r|n. 


32. An urn contains 4 white and 6 black balls. Two balls are successively 
drawn from the urn without replacement of the first ball. If the first ball is seen 
to be white, what is the probability that the second ball is also white ? 


33. A secretary writes four letters and the Corresponding addresses on 
envelopes. If he inserts the letters in the envelopes at random irrespective of the 
addresses, find the probability that only one letter is placed in the corresponding 
envelope, Also calculate the probability that all the letters are wrongly placed. 


34. Ten students have identical raincoats which they hang on the same rack 
while attending class, After the class each student selects a raincoat at random 


and goes home. What is the probability that at least one raincoat goes to its 
original owner ? 


35. Two urns contain Tespectively 2 white and 1 black balls, and 1 white and 
5 black balls. One ball is transferred from the first to the second urn, and ther 
a ball is drawn from the second urn, 


a | What is the probability that the ball drawn 
is white ? 


36. There are two identical urns containing respectively 4 white and 3 red balls 
and 3 white and 7 red balls. An urn is chosen at random, and a ball is draw? 
from it. Find the probability that the ball is white, If the ball drawn is white- 
what is probability that it is from the first urn v) 


87. There are three identical boxes, each provided with two drawers. In the 
first, each drawer contains a gold coin; 


p Р 5 in the third, each drawer contains a silver 
coin ; and in the second, one drawer contains a gold and the other a silver coin. 
A box is selected at random, 


; c and one of the drawers is opened. If a gold coin is 
ound, what is the probability that the box chosen is the second one ? 


3.4] EXERCISES 45 


38. Three urns contain respectively 1 white and 2 black balls; 2 white and 
1 black balls ; 2 white and 2 black balls. One ball is transferred from the first to 
the second urn; then one ball is transferred from the second to the third urn; 
finally one ball is drawn from the third urn. Find the probability that the ball 
is white. 

39. There aren urns each containing N balls, of which N, are white and N, 
black. One ball is transferred from the Ist to the 2nd urn; then one ball is 
transferred from the 2nd to the 3rd urn and so оп; finally one ball is drawn from 
the nth urn. Prove that the probability that the ball is white is N,/N. 

40. If two events А and В аге independent, show that А and B are independent, 
and hence that A and B are also independent. е 

4l. Let A, B, C te mutually independent events. Then prove that 4 and B+C 
are independent and also that A, B, C are mutually independent. 


42. If the probabilities of n mutually independent events te р,,р,,...... фа 
then show that the probability that at least опе of the events will occur is 
1-(17p }(1-р„)......(1-р„). 

43. The outcome of an experiment is equally likely to be one of the four points 
in three-dimensional space with rectangular co-ordinates (1, 0, 0), (0, 1, 0), (0, 0 1) 
and (1, 1, 1), If A, B, C denote the events x-co-ordinate 1, }-co-ordinate 1, z-co- 
ordinate 1 respectively, then check if A, B, C are.mutually independent. 


CHAPTER 4 


COMPOUND EXPERIMENTS 


4.1 CARTESIAN PRODUCT OF SETS 


Let S and Т be any two sets. The cartesian product of S and 7, 


denoted by S x T, is defined to be the set of all ordered pairs (x, у) 
where x € S and ye T. 


For example, if S and T are finite sets such that S contains the 
m elements X,,X25......Xm and T contains the n elements у,, уг,...... Ут» 
then 5 x Т contains the mn ordered pairs 

(х,у) (=1„2,...т; je1, 2,......5) 
We shall also write 
SxS=S*, SxSxS-S? etc, 

4.2 JOINT INDEPENDENT EXPERIMENTS 
Let E be a random experiment described by the event space S which 
contains m event points, viz. U,, U,...... Um having given probabilities 


Р(О)=р  (i-1,2,...m) (4.2.1) 
so that 


m 


2; р;=1 (4.2.2) 


Let £’ be another experiment and S its event space containing the 
n points U,', U,,...... U,/ with probabilities 


P(Uj)-p; (j=1,2,...n) (4.2.3) 
such that 


, Èr- 1 (4.2.4) 


The experiments E and Е’ are performed successively in such а 
manner that the second experiment Е’ is independent of the result of 
the first experiment E, i.e. the result of E does not in any way affect 
the performance of E’, We are, in fact, going to define the independence 


4.2] INDEPENDENT EXPERIMENTS 47 


of the experiments E and E', but for motivating the same we argue 
intuitively as follows. 


The joint performance of E and E' we shall call the compound 
experiment E". The event points connected with E" will then be 


(Ui U;) (i71, 2,,.. 33-1; 2: dl) 
so that the corresponding event space S"=S x S". 


Now consider the two events ‘U; occurs in Е’ and ‘Uj’ occurs in E", 
both connected with E", The former contains the event points 


(Ui U1 (Us Us)... (Uo Un) 
and the latter the points 
(Us, Uj), (Us, Uj)..... (Um, U5) 


80 that their product is the event point (U; U;). Since the js 
experiment is independent of the second experiment, the event U; 
Occurs in Æ connected with E" may be simply regarded as ipae 
Point U, connected with E, which has a probability 7. Similarly, 
the event *U; occurs in E" has a probability ру. Now since the 
*Xperiments are independent, it is reasonable to assume the icm Зар 
‘U; occurs in Е’ and ‘U j' occurs in E" to be stochastically independent, 
and hence 


РҚО, Оу) =рарў | G1, 2-5 j=l, 2..m) (42% 


E and E’ to be independent if the 


We now define the experiments nts of S" is given 


e ignment of probabilities to the different event poi 
У (4.2.5). We have 


S" 22:2: (U; Ui) 
So 
P(S")- 3: 9 РІО, 05) -Z >н 
= (р) (Sai)= 1121 


Which is a necessary condition. 


48 COMPOUND EXPERIMENTS [ 4.3 


Theorem. If A and B are any two events connected with E and Z’ 
respectively and E, E' are independent, then 
P(A, В)} = P(A) P(B) (4.2.6) 


A=5 Ua B-Y Ug 
a B 
where the indices a and g run over some subsets of the sets 1, 2,...т 
and 1, 2,...п respectively. Then 
Р(4) =) Р(О,)= Pos PB) - 2, (Ug) - > ри 
a « 
The event (A, B) connected with E" may be written as 


(A, B) => > (б, Ов) 


Proof. Let 


Hence 


P\(A, Ву -5› >, PU, Up D); => D pa Pe! 
= (È pa) (2209) = РО)Р(В) 


The above formulation тау be easily generalised to more than two 
experiments. 


4.3 REPEATED INDEPENDENT TRIALS 


It the experiment E is itself repeated twice, then the compound 
experiment E, will have event space 5 x 5 = 5° which contains the m° 
points (U;, U;) (i, j 1, 2,...... m). The independence of two trials 
will be realised in practice if they are performed under identical 
conditions and mathematically defined by 


PU; Ujl|-pip; (i, f=1,2,...m) (4.3.1) 
which follows from (4.2.5). 


For r independent trials of E, the compound experiment will be 
denoted by E,, the corresponding event space being S". There are m” 
points іп S”, viz. 

(б. Шш... Us) (3» 2,...1,= 1, 2,.. m 


^ their probabilities being given by 


(dosi, = 1, 25...) (4.3.2) 
The generalisation of the last theorem will be 


iaf BERNOULLI TRIALS 49 


Theorem. Let A,, Ag,...... A, be any events connected with the 
random experiment Е. Then for r independent trials of E 


P{(Ay, Asy......4;)} = РА) P(A,)......P(4,) (4.3.3) 


Example. From an urn containing z tickets numbered 1, 2,...... n,k 
tickets are drawn at a time and replaced before the next drawing. 
Find the probability that in r such drawings, tickets no. 1, 2,...... r do 
not appear in the Ist, 2nd,...... rth drawings respectively. 

Let 4; be the event that the ith ticket does not appear in a single 
drawing of k tickets (i = 1, 2,...... ғ). Then 


ра) = (50) (0) "5% 


Since the balls drawn are always replaced, we have r independent trials 
of the above experiment, and the required event is (44, Ao;...... A,) so 
that by (4.3.3) 

PG, 45,...... A,)}=("-k)"/n" 


4.4 BERNOULLI TRIALS 

If a random experiment be such that its event space consists of only 
two points which are usually called ‘success’ and ‘failure’, then a 
sequence of independent trials of the experiment will be called a 
Bernoullian sequence of trials, provided the probability of ‘success’ (or 
‘failure’) remains the same for all trials. 

Let £ be the given random experiment, its event space S containing 
the two points ‘success’ and ‘failure’, to be denoted by the symbols s 
and f respectively. Let p be the probability of success, i.e. 

P(s)=p, P(f)=1-p= (say) 

The compound experiment of п independent trials of E, denoted 
by E,, has event space S" containing 2” points represented by succes- 
sions of the symbols s and f of the type (s, s, /, s,...f). By (4.3.2) their 
probabilities are given by 

Pis, s, f; 5›....../)} - P-P-4-D......d (4.4.1) 

Binomial law. Let А; denote the event ‘i successes’ (consequently 

n-—i failures) connected with the compound experiment E,. Let the 


4 


50 COMPOUND EXPERIMENTS [44 


n trials be represented by n rooms. Then the required event means 
that of these п rooms i rooms are selected in which we place the 
symbols s, the remaining n—i rooms being filled by the symbols f. 


Since i rooms can be chosen out of n rooms in (7) different ways, the 


event A; contains (2) event points each having probability p'g"~', 50 
that 


P(A) - (7) рч" 
-()rü-»*-* G-0 1,2...) (442) 
This is called the binomial law. 
1. The events Ay, А,,...А„ are pairwise mutually exclusive, one 


of which necessarily occurs, i.e. 


So 
= A P= Ži () pig’ = (р+ду' = 1 
t=0 1-U 
2. P(4,)- q", which 1s the probability of no success at all. Hence 


the probability of at least one success =P(A,)=1-q". And Р(4,)= 
р", i.e. the probability of a complete run of successes is p. 


3. Next we prove the following identity : 


" D 


> () po -i ха (1- xyn-t dy "TE, 


i=k {ars (1 - х)" ах 


Where the L.H.S, cle ( 
‘ELS. clear] k 
Successes in n trials, Y Tepresents the probability of at least 


44] BERNOULLI TRIALS 51 
n! M di d 
h^ gregi Oar dx 
! s " р 
торі [ra -5y-* 0-1) f Жй—зукк dx ] 


or 
Ik- Tht = (i) р^" 


Replacing k by k 4-1, k 4 2,...... n- 1 and adding all these results and 
noting that 7, = р", the identity (4.4.3) follows. 


1 
Remark. We know B(/,m)=f x'-+ (1-x)?-: dx is the beta 
0 


function, and B,(/, m)=f xii (1—2)"7* dx is called the incomplete 
0 


beta function. Tables of B,(k,n-k+1)/B(k,n-k+1) have been 
prepared for different values of p, n, k, from which the individual 


terms (p) p'q*-* of the binomial law may be easily obtained. 


4. GREATEST TERM. Let P(A,) be maximum when i= im, so that 
i, may be called the most probable number of successes. We have 
P(A) _1_1-(п+1)р+1 
P(Au3) (n-i)p 
If (п + 1)р is not an integer, let r denote the greatest integer less 
than (n + 1)р, i.e. 
(п1+1)р-1<г<(л+1)р 
Then P(A,) < P(A,) <... < P(A,) > P) >... > P(An) which 
shows that ij, =r. 


If (л + 1)p is an integer, writing s = (л + 1)р, we have 

P(A,) < P(A;) <...< P(A4,-) = РА) > РА) >.> PAn) 
so that im=s-— 1 or s. 

The results of both the cases may be combined into the single 
statement that im is the integer(s) determined by the following inequa- 
lities : 

(n+1)\p-1 < is < (np (4.4.4) 


52 COMPOUND EXPERIMENTS (44 


Examples 


1. In Ex. 1, Sec. 3.1 we can find the probability of 2 heads directly by the 
binomial law (4.4.2). Our random experiment now consists in tossing the coin 
and observing if it is head, so that the event ‘head’ may be called a success and 
‘tail’ a failure. Here the probability of success, p=% 50 that g-1-4-1. Hence 
the required probability is 


rad- (33) (5)-3 


2. Take Ex. 4, Sec. 3.1. If the experiment consists in throwing the die and 
seeing if the result is six, the probability of success, p = 1/6 and 4—5/6. Therefore 
the probability of least one six in k throws with the die= 1— (5/6). 


9. A and B play a game which must be either won or lost. If the 
probability that А wins any game is p, find the probability that A wins 
m games before B wins n games (m,n > 1). 


The issue will be clearly decided in m+n-1 games, and the 
required event is equivalent to, i.e. implies and is implied by. the event 
that of these m+n—-1 games A wins at least m games, Now calling A’s 
winning a game a success, we have т+п-1 Bernoulli trials with 


probability of success P, and the required event being at least 
m successes, its probability, using (4.4.3), is 


РА 
ERGE á ' [ xn-i (1 = х)"-1 ах 
2, rane a 
i=m f x71 (1 - x)" dx 
0 


4. DRAWINGS WITH REPLACEMENT. Let an urn contain 
М-м; +N, balls, of which N, are white and Nz black. If п balls 
are drawn successively from the urn, the ball drawn being replaced 
each time, find probability that ; drawings will yield white balls, 


We know the probability of drawing a white ball = N,/N. Consider 
the experiment. of drawing a ball from the urn and noting if its colour 
is white. Then the probability of Success, р= N./N and q=l-p= 


stion may be regarded as 


at ee ваз 


44] BERNOULLI TRIALS 53 


DRAWINGS WITHOUT REPLACEMENT. If the balls are drawn 
without replacements, or, in other words, all the л balls are drawn 
simultaneously, the problem becomes identical with Ex. 6, See, 3.17 
where we found the probability of i white balls to be 


Cen) 
i | \n-i 
N 
| п ) 
It will be interesting to find the limiting form of the above expression 


as N> œ, keeping р fixed and hence №,, Nooo, 


The expression 


ММ, -1)...(М,—ї+1) NU 7 1). (М. - n E) 
T. uno NI NIS (n- i)! 
n! 
х N(N-1..(N- n4 1) 


(GO OI 
к) 6-2) 6-5) (t 
(i-3) (1-55) 


=( : ) р as N> ә 


Thus for drawings without replacement, we get the binomial law in the 
limiting case. 


5. BANACH’s MATCH-Box PROBLEM. A mathematician always 
carries two match-boxes, each containing п matches, Whenever he 
needs, he chooses a box at random and draws a match from it. Find 
the probability that when the first box is found to be empty for the 
first time, the second box will contain exactly i matches. 


The event in question means that the first box is chosen п + 1 times 
corresponding to the л matches drawn from it and the case when it is 
found empty, and the second box is chosen n—i times so that 
i matches are left in it, and further in the last drawing the first box is 


54 COMPOUND EXPERIMENTS [44 


chosen when it is found empty, but in the remaining drawings the 
two boxes may be chosen in arbitrary order, Suppose we consider 
the experiment of choosing a box at random and call the choice of 
the first box a success. Thus we have a Bernoullian sequence of 
nrlen-i-2n-i-1 trials with probability of success p=}, and the 
probability of occurrence of п successes in the first 2n- i trials (the 


successes appearing in any order) and a success in the last trial is, by 
(4.3.3) and (4.4.2) 


E 2n - i), vances 
(^) ex cx a= (7o (4.4.6) 
which is the required answer. 


6. Anurn contains п tickets numbered 1 to n, from which a ticket 


is drawn and replaced r times. What is the probability that the 
greatest number drawn is i ? 


Let X; denote the event that the greatest. number drawn is less 
than or equal to i, which is identical with the event that each of the 
r drawings yields a ticket whose number is le 
Consider the random exp 
observing if 
numbers 1, 2,. 


ss than or equal to i. 
2гітепі of drawing a ticket from the urn and 
its number is less than or equal to i, i.e. one of the 
..l, so that the probability of success is i/n. Since X; is 
same as the event of all successes in r Bernoulli trials, Р(Х) = (in). 

Now the required event is obviously ¥;- ¥ 
so that by (3.1.4) the required probability is 


Р(Х: Ху) = P(X) - P(X; 5) 
=[-(@-1) yn" (4.4.7) 


1-1 Where Xii C XY, 


Poisson approximation 


Set p-u[n, where џи is a given positive number, 
limit as п > œ and hence P-—0. We have 


P(A) = ( zl p = р)" n(n =D.. i+) 


and we pass to 


P - py-t 


5] POISSON TRIALS 55 


Hence, if the probability of success р is small and the number of trials 
n large, such that и=пр is of moderate magnitude, we have the 
approximation formula 

P(A) = e* 3l (20,1, 2,......) (4.4.8) 
This is called the Poisson approximation to the binomial law. 


We note, if S^? denotes the limiting event space 
SPee А +А,+4„+......... to œ 


Since d; Ais Йөз are pairwise mutually exclusive, by Axiom III 
P(S*) = Р(4,) + P(Ay) + Р(А„)+...... 


i 
-e act etn 
Be 


i=l 

Example 7. An urn contains 1 white and 99 black balls. If 1000 drawings are ` 
made with replacements, what is the probability of 10 white balls ? 

Using the notations of Ex. 4, p= 1/100, n = 1000, so that и=пр=10. 

Here р is small and n large such that « is of moderate magnitude, and hence 
we can conveniently use the Poisson approximation (4.4.8). Therefore the required 
probability = e-*° 10'9/10 ! 20.125. 

The result given by the binomial law is 0.126, which shows that Poisson 
approximation is fairly good. 


4.5 POISSON TRIALS 
A. sequence of independent trials of a random experiment, the event 
space of which contains two points—success and failure, is called a 
Poisson sequence of trials if the probability of success is not constant 
but varies from one trial to another. It will be rather cumbrous to 
deduce general formule for a Poisson sequence of n trials, and let us, 
for convenience, consider only three trials. Let the probabilities of 
success be ру, ра, ра in the three trials respectively, and hence the 
probabilities of failure are respectively 1—p1=41, 1 - Pa 7 do: 1-Ps-7ds. 
The event space of the compound experiment will then contain the 
following 8 points : 

U,-(S, 5, S), U,-(S, S, F), U,-(S, F, S), 

U,-(F.S. S), U,-(F;F,$) U, -(F, S, F), 

U,-(S, F, F), U,-(F, Е, Е) 


56 COMPOUND EXPERIMENTS [4.6 


Since the trials are independent, the distribution of probabilites in 
this event space is given by 


P(U5) - PiPsDs; P(U ;) - P3ps45, P(Us)— pidsps; 
P(U)- dipsps; P(U;)- 41dsPs; PU .)=4sP 24s: 
P(U.-p.q.qs, ` PU s)= 419045 


If A; denotes the event ‘i successes’, then 
Ag=Us, Ay=U,+U,+U;, AgeUg+U,4+Uy, A,=U, 
Hence 
P(A)= 419293» Р(А,)=419:Ра+ 4р4 +P i424 \ (4.5.1) 
(45) =PsP2ds4+Pi42Po+4:P2Psr Р(А„)=рурьр» Í 
and 


D PA) = (Ps +41) 0202) (Pa + 4a) 1 


i=0 
4.6 MULTINOMIAL LAW 


This is another generalisation of the binomial law. Here the event 
space S of a random experiment E contains, instead of two points as 
in the Bernoullian case, m points, in general, -U,, U,,...... U m such 
that 

P(U,) = р. (к= 1, 2,...... т) (4.6.1) 
subject to 


= pr =1 (4.6.2) 


Let a sequence of n independent trials of E be performed. The 
event space 5" of the compound experiment Е, will contain m" points 
of the type, say, (Gigs Cig: шы U4,,U,) Let Aii... denote the 
event ‘U, occurs i, times, U, occurs i, times,...U,, occurs i, times’, 

: : 3 ! 
where rin, which contains , ҮЗҮ E (being the number of 
je 1**2-*...m* 
permutations of n things of which i, are alike, i, are alike, etc.) event 
Points each having probability p, po... p, iv, Therefore 
P(A; і n! 


iain) = FR 2 
x in) ft ty hd 1 


Di Pett... Pm" (4.6.3) 


(71, fes...im=0, 1, 2,...2 such that x iy n) 


4.6] MULTINOMIAL LAW 57 


The R.H.S. is the general term in the multinomial expansion of 
(Py EPa Froni +Pm)", and hence formula (4.6.3) is called the multi- 
nomial law. 

Now the events A4j4,...5, (ay ѓаз... = 0, 1, 2,...2 such that 
Уі, = п) are pairwise mutually exclusive, and 


> Айа =" 


dtes» 


where the summation is over all values of iz, ig...im having sum л. 
We then have the necessary identity : 


P(S")- > FC Assis so) 


dem 


= Imp p {pts p tm 
2 HELPER NEU ЧОЛАН 


dish а 


=(р,+ра+...... +рм)"=1 


Examples 
1, A die is thrown 10 times in succession, Find the probability of the 
occurrence of six 4 times, five twice, and all the other faces once each. 


10! түе 
By (4.6.3) the answer= ti 71 T1 T1T1T (s) =0.0013. 


2. DRAWINGS WITH REPLACEMENT. Ап шп contains N=N, 
+Na+...Nm balls, of which №, are of the first colour, N, of the 
second colour,...... and Nm of the mth colour, and n=i,+i9+...... 
+in balls are drawn successively with replacements. Find the 
probability that of the balls drawn i, are of the first colour, i, of the 
second colour,... and i,, of the mth colour. 

Let E denote the experiment of drawing one ball from the urn 
and noting its colour. Then the event space S contains m points, viz. 
the m different colours, and the probability of the Ath colour is 
N/N = ру (say), (К = 1, 2,...m) and therefore x р. 1. 

Now the п drawings will correspond to z independent repetitions 
of E, so that, by (4.6.3), the required probability 


N di s 


x (4.6.4) 


58 COMPOUND EXPERIMENTS [4.7 


DRAWINGS WITHOUT REPLACEMENT. This case has been treated 
in Ex. 8 Sec 3.1, and the probability is 


62 62) 612) 


(1) 
n 

If N — œ, subject to the condition that p;'s are kept fixed, the 
limiting value of the above probability will be 


! i i 
eae, A e ute 
Es Mo E | 


4.7 INFINITE SEQUENCE OF BERNOULLI TRIALS 


Consider now an infinite sequence of Bernoulli trials, i.e. an infinite 
sequence of independent trials of a basic random experiment E whose 
event space S contains only two event points, viz. ‘success’ s and 
‘failure’ f. Let Р(х) - p and P(f)=q=1-—-p. 

The event space of the whole sequence of trials then consists of 
an infinite number'of points such as (s, f, 5, s, /,...). The independence 
of the trials may be conveniently characterised as follows. If Aj, Ag,... 
is any infinite sequence of events each connected with the experiment E 
(i.e. each denotes one of the four events : success, failure, impossible 
event and certain event), then 


Pi (A1, Ags...) Ee P(A) P(A)... (4.7.1) 
assuming that the infinite product on the right is convergent. 


In particular, if in (4.7.1) 4, = S, the certam event for n > r, then 
we get 


Р\(4,, A5... 4, S, S,...)}=P(A;) P(A2)...P(A,) (4.7.2) 
Examples 
1. Inan infinite sequence of Bernoulli trials with probability of 
Success p, find the probability that i failures will precede the first 


Success. 


i Inthe required event we have failure fin the first i trials, success 
sin the (7+ 1)th trial and the certain event S in all subsequent trials. 
Hence by (4.7.2) the required probability is qp. 


48] MARKOV CHAINS 59 


2. A and B toss a coin alternately and the first to obtain a head wins the toss. 
If A starts the game, find the probability of his winning. 


Let A. denote the event that A wins in (2n +1) tosses, i.e. а head appears in 
the (2n+1)th toss but the results of the preceding 2n tosses are all tails (n=0, 
1, 2,.). Then by (47.2) P(A4,)-(3)'". à. The required event is clearly ZA, 
whose probability is P(34,) 9 ХР(4 )= (3) "378. 


4.8 MARKOV CHAINS 


Let us now consider a simple but important case of dependent trials 
known as a Markov chain. Let E bea given random experiment whose 
event space S contains the m points U,, U.;...Um- Consider a sequence 
of n trials of E such that the outcome of any trial depends on the 
outcome of the immediately preceding trial but not on the outcomes of 
earlier trials. Let E, denote the compound experiment of the п trials 
of E, the corresponding event space being S? which contains the 
event points 
(Шз. Ог,» Uin) (as Тео eS 25.87) 

Let A," denote the event in space 5" that U; occurs at the kth 
trial, Then 

PAU gy) Ог...) P(An Ate?» Aig”) 

= P(A )Р(А;, 2144,2)... PAn l An Aia Aina")... 

РОДОТ A tes O 


in 
Since the outcome of the kth trial depends only on that of the (k — 1)th 
trial, we have 
P(A, lAn Ai Aig a?) = PAn Aiga) (4.8.1) 
and also assume that the conditional probability on R.H.S. is the same 
for all trials so that it depends on the indices ix-s and іх only, and 
we may write 
P(A F AF?) = pj (i, j= 1, 2,...т) (4.8.2) 
It follows that 
Pi; = О for all i, j 
Xpi-lforalli (4.8.3) 
j 


Also let 
P(A;*)= л; (4.8.4) 


60 COMPOUND EXPERIMENTS [4.8 


so that 


m 20 forall і, xa;-1 (4.8.5) 
PGi, Ui, Ui «bm аары. Pints---Pin—aiy (4.8.6) 


The above informal discussions lead to the following definition of 
à Markov chain. 


A sequence of п trials of a random experiment E whose event space 
S contains the m event points U,, U.,...U,, is said to be a Markov 
chain if numbers z; (i1, 2,...m) and pj; (i, j — 1, 2,...m) subject to 
(4.8.3) and (4.8.5) are given such that probabilities are assigned in S^, 
the event space of the z trials of E, by (4.8.6). 


Now starting from this formal definition, we may easily deduce 
(4.8.1), (4.8.2) and (4.8.4). Summing (4.8.6) over і, we get 
P(44,*452...4,  ., ) - РО, БА... S)} 


1221117 inea? 


“лири Pingin- 25 Pis ín 
" 


= TirPiria---Pin_oin-y [by (4.8.3) 
Continuing this process of summation we have for 1 ken 
P(A;,* Aj, P. Ag") = nis Р ырыл» Pip gin (4.8.7) 


For k=1, (4.8.7) reduces to (4.8.4). Since x л; = 1, it follows that the 
sum of the numbers on the R.H.S. of (4.8.6), 
is 1 which is a necessary condition for a valid 
lities in S”. 


which are non-negative, 
assignment of probabi- 


Replacing k by k— 1 in (4.8.7) and dividing (4.8.7) by this result 
we get 


Р(А A5, Ag ag 3) = 


Pig. in; 
Also 
P(AjF)- x Mir Pirin Di у (4.8.8) 
the summation being taken over i 15 155... £4.,, and 


PAG, a 4:3) TP. 4 kD Ri рий. 


; Рі абра 
the summation being over I5 155: еа 


48] MARKOV CHAINS 61 


Dividing this result by that obtained by replacing k by k-1 in 
(4.8.8) we get 
Р(А | Ag. ., 7) = Pa и 
Thus the interpretations of л; and р;; are obtained. 


The m x m matrix 
Dii Pase im 
P= [Por Раз...Рат (4.8.9) 
Pmi Pmea---Pmm 
is called the matrix of conditional probabilities and the row-vector 
л= (21, 5...) is said to give the initia! probability distribution of 
a given Markov chain. 


Note that each row sum of P is 1, and each element of P is non- 
negative. Likewise, the components of л are non-negative and their 
sum is 1. 

In the theory of Markov chains it is customary to adopt the follow- 
ing physical analogy. Imagine a physical system which is capable of 
being in one of the m states U,, Us,...U with different probabilities 
and that the system can alter its state only at times ¢,, fo,...t, (which 
correspond to the n trials). The conditional probability p;; is then 
called the probability of transition from state U; to state U;, written 
О; U;, at any time, and the matrix P is called the matrix of transi- 
tion probabilities or simply the transition matrix and the vector д is 
said to give the initial probability distribution at time t,. 

Higher transition probabilities. We know pj; is the probability 
of the direct transition U; — U;. Let us now calculate the probability 
of transition from U; to U; in r steps. This can materialise as the 
sequence of transitions 


Up Uj, — Ui, — UU, 2 U; 


LET 
where the indices /,, i2,...i,-, may vary freely. 
Let the probability of transition from U; to U; in r steps be denoted 
by pij" “and the matrix 
P,.- (pij?) (i, ij 91, 2,...т) (4.8.10) 
By the laws of probability 
pii? = S Piir Piria- Pipa i (4.8.11) 


62 COMPOUND EXPERIMENTS [£8 


where the summation is taken over i, i,,...i 


(1) 
Pij -Pii 


т-1, In particular 


or P,=P and 
Pig? = E Pit Pri 
which gives Р„=Р?, Since 
py > Pix pur 


we get P.—P.P, ,. By induction it follows that 


P,=P" (4.8.12) 


Let us now compute the (unconditional) probability that the system 
finds itself in state U, at time t, which will be denoted by ль? and let 
the row-vector ў 

а= (810), дале) 

give the probability distribution at time /,. Now the said transition 
may occur as transition from state U; at time г, to state Ux at time г, 

in r—I steps where the index i may vary freely. Then 
лк = N n; Pa? (4.8.13) 

or 

gua Pr-t (4.8.14) 
Example. Consider a Markov chain of coin tossings where the 
first toss is given to be a fair one and the transition probabilities of 
H>H, Н-Т, TH, ТУТ (H—head, T—tail) are respectively 
4, 4, 0, 1. Find the probability of (i) a run of heads in first three 


tosses, (ii) tail in the 3rd toss assuming that there was head in the Ist 
toss, (iii) head in the 4th toss. 


Taking U, = Н, О, = Т, the transition matrix 


and the initial distribution vector, л =(}, 1). 
(i) By (4.8.6) the probability of a run of heads in first 3 tosses is 
PXH, Н, Нуү=лурууруу = bi-1 


о ЖЕКЕ 


49] EXERCISES 63 


The probability of tail in the 3rd toss assuming that there was head 
in the 1st toss is indeed the probability of transition from H to T in 2 
steps, i.e. p, 7? =}. 


(iii) р®=[й | 
Ву (4.8.14) 
=P= [0 3) = 1/16, 15/16) , 
0 1 


so that the probability of head in the 4th toss is x,“ = 1/16. 


4.9. EXERCISES 

1. Three cards are successively drawn from a full pack, the card drawn being 
replaced every time. Find the probability of the result : spade, heart or diamond, 
queen. Also find the corresponding probability if the order of occurrence of the 
events is ignored. 

2. The probability that 4 can solve a certain problem is ; and that B can solve 
itis j. If both try it independently, what is the probability that it is solved ? 

3. Find the probabilities of (a) 3 heads, (6) at least 3 heads and (c) at most 
3 heads in 5 throws with a coin. 

4. What is the probability of obtaining multiple of three twice in a throw 
with 6 dice ? ` 

5. 4 cards are drawn successively from a pack with replacements. What is 
the probability that all the cards are of the same suit ? 

6. The probability of hitting a target is 1/5. If 10 shots are fired, find the · 
probability of at least two hits. Find also the minimum number of shots to be 
fired in order that the probability of hitting the target at least once exceeds 1/2. 

7. When a defective die is thrown 10 times the probability that an even face 
occurs 5 times is twice the probability that the same event occurs 4 times. Find 
the probability that an even face will never occur in 4 throws of the same die. 

8. Suppose that the probability of a new-born baby to tea boy is 1/3. Ina 
family of 8 children, calculate the probability that there are 4 or 5 boys. 

9. Ifa die is thrown n times, show that the probability of an even number of 
sixes is § 41+(2/3)"}- 

10. Find the most probable number of times the event—multiple of three 
occurs when a die is thrown (a) 50 times, (b) 100 times. 

11. Show that the most probable nunber of heads in 2n throws of a coin is п, 
and that the corresponding maximum probability lies between 1/24/m and 
1/ J2n-- 1. 


64 COMPOUND EXPERIMENTS [4.9 


12. Prove that in п Bernoulli trials with probability of failure q, the probability 
of at most К successes is 
a t 
\ х"-®- | —x)dx | \ х”-®-щЩ1 — x)*dx 
0 0 
13. Ifa coin is tossed repeatedly, show that the probability of getting т heads 
before п tails is 


т+п-1 
af m+n-1 
m S(T 
i=m 

14, In a Bernoullian sequence of n trials with probability of success p, find the 
probability that the ith success occurs at the nth trial. 

15. In Banach’s match-box problem, find the probability that when the first 
box is just emptied (i.e. the last match is drawn from the first box) the second box 
contains exactly i matches. 

16. А class has only three students A, B, C who attend the class independently, 
the probabilities of their attendance on any day being 3, 3, 2 respectively. Find 
the probability that the total number of attendances in two consecutive days is 
exactly three, 

17. İfa dieis thrown n times, find the protability that (a) the greatest, (b) the 
least number obtained will have a given value i. 


18. From an urn containing п tickets numbered 1, 2,...... n, m tickets are drawn 
ata time and replaced before the next drawing. Show that the probability that in 
k drawings each of the n tickets will appear at least once is 


п\ (n—my , (n, jn-my'jn-m-1y* 
1-(1) ( n y+) ( n Ves J= 
19. What is the probability that in a company of 500 people only one person 


will have birthday on New Year's day? (Assume that a year has 365 days.) 


20. А card is drawn from a pack and replaced 260 times. Find the probability 
of obtaining queen of hearts 4 times. 


21. A system consists of 1,000 connected components, where each component 
may fail independently of the others. If the probability that a component fails in 
one month is 10-*, find the probability that the system will function (i.e. no 
component will fail) throughout a month, 


i 22. Prove that, in a Poisson sequence of n trials, the probability of i successes 
is the coefficient of x‘ in the product 


(р,х+а,)(р„х+,)......(р„х+а„) 
where p; denotes the probability of success in the ith trial and q,=1-p;. 


: 23. Three coins having probabilities of head 1/2, 2/5, 3/7 respectively are 
thrown. Find the probability of obtaining exactly one head. 


4.9] EXERCISES 65 


24. What is the probability that the faces 1, 3, 5 turn up 2, 3, 3 times respec- 
tively in 8 throws of a die ? 

25. Ап urn contains 10 balls, of which 5 are white, 3 red and 2 black. If a ball 
is drawn an replaced 3 times, what is the probability that the balls are of diflerent ` 
colours ? 

26. A and В alternately throw a pair of dice, A starting the game. 4 wins if 
he throws six before B throws seven, and B wins if he throws seven before A throws 
six. What is the probability of A’s winning ? 

27. Three persons toss a coin in succession and the first to obtain a head wins 
the game. Find their respective chances of winning. 

28. A player repeatedly throws a coin and scores one point for a head and 
two points fora tail. If p, denotes the probability of scoring n points, then show 
that 2p,-p,-,--p.-.. Hence deduce an expression for pa and find its limiting 


value as n tends to infinity. à 


29, Ifa day is dry, the conditional probability that the next day will also be 
dry is p ; if a day is wet, the conditional probability that the next day will be dry 
isp’. If u, is the probability that the nth day will be dry, prove that 

u,-(p-p)u,-,-p'-0 (n> 2) 
If the first day is sure to be dry and p- 1, p/— 1, find tn. 


30. A system having three states U,, U,, U, changes its state at times 1=0, 
1, 2..., the matrix of transition probabilities being 


. 100 
: ii) 
0331 


If it is certain that the initial state of the system is U,, find the probability of 
(i) the event that the state of the system is U, at (20, О, at 1-2 and U, at t=3 
(ii) transition from state U, at 122 to state О, at t=4, (iii) the event that the 
state is U, at t=4. 


31. Fora two-state Markov chain with transition matrix 
basa 
1-4 "-2 
and initial probability distribution (z,, 7»), calculate the probability distribution 


at the nth trial, and show that as п =» co, this distribution is independent of the 
initial distribution. 


CHAPTER 5 


PROBABILITY DISTRIBUTIONS 


5.1 MATHEMATICAL TOOL : FUNCTIONS ON SETS 


Let us be given two sets M and N. If to every element a € M there 
corresponds, by some given rule, a unique element b c N, then we 
get a correspondence from M to N. This correspondence is called a 
function, and we write b— b(a). The set of all b’s, i.e. the function 
values (in the abstract sense) form a subset К of N, The set M is 
called the domain of definition of the function and the set R the range 
of the function. We at once recognise the function of a real variable 
to be a particular case of this, which gives a correspondence from the 
set of real numbers to itself. 


By setting up such a correspondence, we may, as it were, pass from 
the set M to the set Л, i.e. instead of the elements of M, we may study, 
if found to be more convenient, the properties of the elements of М, 
thereby taking an indirect account of the elements of M through the 
given rule of correspondence. In the theory of probability, we exactly 
feel such a need. We know that the results of a random experiment, 
i.e. the event points are in general abstract entities, and it is rather 
difficult to develop in detail a mathematical theory dealing with them. 
On the other hand, we have at our disposal the well-known theories 
of algebra and analysis of real numbers, and if we can pass from 
the abstract event space to the set of real numbers by means of a corres- 
pondence, things are expected to prove more convenient. This gives 
rise to the definition of what are know as random variables. 


5.2 RANDOM VARIABLES 


If corresponding to every point U of an event space S, we have, by a 
given rule, a unique real value of X= X(U), i.e. X is a real-valued 
function defined on S, then X is called a random or stochastic variable 
or sometimes a variate. The range of the function X, i.e. the set of 
all values which X takes up will be called the spectrum of the random 


5.2] RANDOM VARIABLES ‘ 67 


variable. The spectrum may be discrete or continuous, and accordingly 
the random variable is said to be discrete or continuous. 


Let A be a given set of real numbers, Then the set of all event 
points U for which X(U) €A is an event which will be denoted by 
Х єА. In particular, the event Х=а is the set of all event points 
corresponding to which X takes the value a. Similarly, we speak of 
the events a< X¥<ba<X<b,—~<X<a, Хғ аеіс; the 
event — e < А < e is obviously the certain event S so that 
P(— œ < X< =)=1. 


Remarks 


1. The term random function would have been perhaps more 
appropriate than random variable but is seldom used in practice, 


2. HALF-OPEN INTERVALS. We know that there are three types 
of intervals, viz. closed, open and half-open intervals. Now the events 
a<X<b and b Х< с are not necessarily mutually exclusive 
but have the event X — b in common; also the events a < X < b and 
b < X < c are such that their sum is not the event а < Х < с. These 
difficulties are overcome if we consider half-open intervals ; the events 
а < Х < band b < X < care mutually exclusive as well as their sum 
is the event a< А < с. This is the reason why, in the theory of 
probability, we shall use the rather uncommon half-open intervals. 


Examples 

1. Consider the random experiment of throwing a coin. We may 
define a random variable X on its event space, which contains the two 
points ‘head’ and ‘tail’, by the following rule of correspondence : X=0 
corresponding to ‘tail’ and Х= 1 corresponding to ‘head’. Then X-0 
and X1 respectively denote the events ‘tail’ and ‘head’. The spectrum 
of X consists of the two points 0 and 1, and P(X - 0) = P(X - 1) = #. 


The inequality — = < X < © denotes the entire event space, and 
we may write (- © < ¥ < e)e(X-0)*(X-1) Since ¥=0 and 
ж = 1 are mutually exclusive events, we must have 


P(X=0)+P(X=1)=P(- = < X< =)=1 


which is true. 


68 PROBABILITY DISTRIBUTIONS [5.2 


2. Take the event space of Ex. 5 Sec. 1.4. A random variable X 
may be defined by: girl— ¥=0, boy — X=1. Here P(X=0) and 
Р(Х = 1) are not necessarily à each. 

3. Let the random experiment consist in throwing a die and X 
denote the number on the turned up face of the die. Then X is a 
random variable, the rule of correspondence being contained in the 
title itself, viz. 

one > X=1, two > X=2, ......... six X=6 
The spectrum consists of the 6 points 1, 2,...... 6, and 
Р(Х=1)= Р(Х=2)=...... Р(Х= 6)=т 

We may define another random variable Y as follows: опе, 

two > Y=0, three, four — Y «1, five, six > y=2 ; Y=0 corresponds 


to both the points ‘one’ and ‘two’ and hence denotes the event ‘one 
or two’. 


4, Let the experiment be drawing a card froma well-shuffled pack. 
If X denotes the number of points on the card drawn (assuming 11 
points for the jack, 12 for the queen and 13 for the king), then X is a 
random variable defined on this event space of 52 points. X can take 
the values 1, 2,...... 13; X=1 denotes the event ‘ace’ which contains 
4 points, and hence P(X= 1) = 1/13 etc. 


5. Consider a Bernoullian sequence of, say, 3 trials. We know 
that the event space S? contains 2% =8 points. Then the number of 


successes X is a random variable defined on 5°, the correspondence 
being clearly 


(F, F, F) —>Х=0 (S, S, F) 
(5, F, F) (S, F, S)} > X=2 
е AN (F, S, S) 
(Р, Е, S) (8,5,5) —X-3 


The spectrum of Х consists of the 4 values 0, 1, 2, 3, and P(X - 0) - q?; 
Р(Х=1)= 34р, P(x-2)- 3qp?, P(X-3)-p*. The sum of these 
probabilities is 1 as it must be. 
6. Ifa ticket is drawn at ran 
numbered 1, 2,. 
X is a random 


dom from an urn containing n tickets 
-n and X denotes the number of the ticket drawn, then 
variable which can assume the values 1, 2,...n and 


P(X=i)=1/n i (= 1, 2,...п) 


5.3] DISTRIBUTION FUNCTION 69 


7. Letr tickets be drawn successively with replacements from an 
urn containing п tickets numbered 1, 2,...7. If X denotes the greatest 
number drawn, then the spectrum of the random variable consists of 
the points 1, 2,...2 and by (4.4.7) 

P(x-i)-[-G-Dy]w ( 1, 2,...n) 

8. If balls are successively drawn without replacement from an 
urn containing N, white and №, black balls (N — №, + Л,), then the 
number of black balls preceding the first white ball is a random variable 
which can take the values 0, 1, 2,....N 4 and by (3.1.15) 


ANNUOS D. (Ns c MN 
P(x-i)-^ ATI NEST (I1, 2... 


P(x=0) ^r 


9. In Ex. 3 Sec. 1.4 the number of telephone calls during a fixed 
interval of time is a random variable, the spectrum of which is the set 
of all non-negative integers 0, 1, 2,...... 

All the random variables discussed above are of the discrete type ; 
in Exs. 1—8 the spectrum is finite, and in Ex. 9 it is infinite. 

10. In Ex.4 Sec. 1.4 the measured value X ofthe length of a rod 
isa random variable. For a theoretical model, we may assume that X 
can take up any real value, ie. X is a continuous random variable 
having the entire real axis as its spectrum. 


5.3 DISTRIBUTION FUNCTION. 
The distribution function of a random variable X isa function ofa 
real variable x, to be denoted by Р(х) (the subscript x pertains to the 
random variable X) or simply F(x) defined in ( — =, ә) by 
F(x)=P(- < X < х) (5.3.1) 
Basic properties of F(x) 
Let b> а. The events – = < X <aanda < X < Ь are mutually 
exclusive, and 
(-e«xz а)+(а< Х<)=(-=<Х<Ь) 


50 
р(-=<Х<0+Рра<Х< 5)=Р(- = < Х <b) 


70 PROBABILITY DISTRIBUTIONS [5.3 


or 
F(a) + Pla < X < b) = F(b) 
or 
F(b)- F(a) - P(a < X < b) (5.3,2) 
1. By Axiom I P(a < X < Б) > 0 so that F(b) > F(a) for b >a, 
i.e. F(x) is a monotonic non-decreasing function, 


2. Let A, denote the event — œ < X<-n (п=1, 2,...). Then 
{An} is a contracting sequence of events such that lim A, =O, the 
impossible event. Now 


P(lim A,) = P(O) - 0 
and 
P(A,)=P(- = < Xz-n-F(-n) 
So that 
lim P(A,,) = F(= e) 
Since by (3.1.10) lim P(A,) = P(lim 4,), we get 
Е(- =)=0 (5.3.3) 
8. Set A,-(- < X « n) (п= 1, 2,...) so that {An} is an expand- 
ing sequence of events and lim 4, - (— œ < у X e)-S. Hence 


. P(lim А„) = P(S) -1 
and 
lim P(A,) = іт F(n) = F(=) 
so that by (3.1.10) 


Fedeli (5.3.4) 
4. For any fixed point a, take A, = fa- 1 а а) (n=1, 2......) 
so that the sequence (A, is contracting and lim A,-(X-a) Now 
P(lim A,) = P(X а) 
P(4,) = F(a) - F(a а : ) 
and so 


lim Р(А„) = F(a) – F(a - 0) 
By (3.1.10) ` 


F(a) - F(a - 0) = P(X =a) (5.3.5) 


5.8] DISTRIBUTION FUNCTION 71 


5. Define a contracting sequence of events, {An} by : 
As (a х<а+ 1) (n 7-1, 2,...... ). 


Clearly, lim А„ = О, the impossible event so that 
P(lim A,) = Р(О) = 0 


апі 
lim P(A,) = lim [z (a + i) - F(a) [+ F(a 4 0) - F(a) 


By (3.1.10) 
F(a+0)=F(a) (5.3.6) 


Thus the distribution function F(x) is monotonic non-decreasing 
such that F(— ©)=0 and F()=1 ; it is continuous on the right at all 
points, but in case P(X =a) > 0, F(x) has a jump discontinuity* on the 
left at x <a, the height of jump being equal to P(X — a). 


The curve у = F(x) is called the distribution curve which is obviously 
confined between the lines y=0 and y=1. 


Remark. Formula (5.3.2) shows that if the distribution function 
F(x) is given, we can find the probability that X lies in any arbitrary 
interval i.e. we may say that the distribution function completely 
determines the probability distribution and hence is of fundamental 
importance. 


Probability mass. Consider a linear mass distribution along the 
x-axis, the total mass being unity. Let the distribution of mass, which 
may vary from point to point, be described by a function F(x) which 
is defined to be the mass on the left of and up to the point x, so that 
the mass contained in any interval a < x < bis F(b)-F(a) If now 
F(x) is interpreted as the distribution function of a probability dis- 
then this quantity is Р(а < X < D), ie. the mass in any 


tribution, 
е identified with the probability that the random variable 


interval may b 


* A function FI (x) is said to have a jump discontinuity at x=a if both the limits 


F(a -0) and F(a+0) exist but are unequal ; F(a+0)-F(a-—0) is called the height 
a= 


of jump at that point. 


72 PROBABILITY DISTRIBUTIONS [54 


lies in that interval. This conceptual mass is called probability mass, 
and often it is convenient to think in terms of this mass analogue 
of probability. 


"There are two principal types of probability distributions, viz. the 
discrete and continuous types which we shall now discuss. 
54 MATHEMATICAL TOOL : STEP FUNCTIONS 


Let a function F(x) be defined in the interval a < х < bas follows. 
The interval a < х < bis divided into m sub-intervals by a given set 
of points co, c,,...cy, such that а= с, < €,...« Cm= b, and 


Е(х)=/, Co X X e, 
=foths €, x «c, 
coo ук NN (5.4.1) 
—fotf ui fua moi X X сь 


=fothit...tfn-1+fn X= Cy, 


where fo, /,...... ‘m are all positive constants. 


The function F(x) is constant in each sub-interval Chay X X < Cy 
(k=1, 2,...m), and we say that F(x) is piecewise constant in a <x<b. 
Moreover, F(x) has a jump discontinuity at each point с, it being 
continuous on the right but discontinuous on the left, and the height 
of jump at cr, F(cy--0) - Е(с,-0) =. Such a function F(x) is called 
a step function in (a, b) having steps of heights f,, /„,...... m at the 
Points C1, Cas......Cm respectively. This definition may be easily 
extended to a step function in (- =, ©), an important application of ` 


which will be found in the next section. 


5.5 DISCRETE DISTRIBUTIONS 


If the random variable X takes up a discrete Set of values ...x s, Х-1› 
Xos Xis X,,... (...х-«<х_,<х<х, SX...) with probabilities 

P(X-x)-f, G-0, +1, +2,...) (5.5.1) 
then the distribution function is given by the following : 


5.5] DISCRETE DISTRIBUTIONS 73 
Inge Xx XQ. 


Е(х)-Р(- о < X< »=P{ F (х=хҗ.) } 


а=-ос 


= Ухх) D, f 00 £1, +2.) (5.5.2) 
а=-со aq--900 
That is, F(x) is a step function having a step of height f; at each point 
х; of the spectrum. 

Formal discussions. For a systematic theory, we shall follow the 
practice of defining any new concept in terms of the distribution 
function. Thus formally we define the distribution of a random 
variable X to be discrete or discontinuous if its distribution function 
F(x) is a step function having steps of heights f, (> 0) at the points x; 
(i=0, +1, +2..., he. given by (5.5.2). In the first place, we must 
see if F(x) satisfies the basic properties of a distribution function. 
Now F(x) isa monotonic non-decreasing function, continuous on the 
right everywhere, F(— =) = 0, and further F(»)-1 if 


> fel (5.5.3) 


ф=-осо 
Thus (5.5.3) imposes a necessary restriction on the constants f;'s. 

It will now be interesting to show how, in this case, we can uniquely 
determine the probability distribution starting from the distribution 
function. 

1. Ifais nota step point, by (5.3.5) P(X - a) = F(a) - F(a - 0) - 0, 
since F(x) is continuous at х= а. 

For a step point Xi P(X= 
shows that the spectrum of X consi 
the probability mass аї.х; is fi. 


ә. Ву (5.3.2) 
Ра<х<®= 2, ^ (55-9 


а<х<Ь 
xtended over all values of i such that 
ep points in the intervala<x<b. 


x)eF(x)-F(x-0)-f;70. This 
sts of the step points x,'s only, and 


where the summation is ¢ 
а < x, < b, i.e. over all the st 


74 PROBABILITY DISTRIBUTIONS [5.6 


Probability diagram. Apart from the distribution curve y = F(x), 
we may also conveniently represent a discrete distribution graphically 
as follows. At each point x, of the spectrum we draw an ordinate 
equal to f, the probability mass at that point ; the resulting diagram is 
called a probability diagram. 

Examples 

1. Arandom variable Xcan assume the values —1, 0, 1 with probabilities 
1/3, 1/2, 1/6 respzctively. Determine the distribution, 

In -o<x< -1, F(x)- P(0) «0 

in -1ex« 0, F(x) = P(X= —1)=1/3 

in Oc x <1, F(x) - РХ - 1) -(X-0) - P(X- -1)+ P(X-0) 

=1/3+1/2=5/6 
andin lox< о, F(x)=P(X= -1)+(Х=0)+(Х= 1} 
=P(X= -D* PQC-0)- P(X=1)=1/341/241j6=1 


*. Let F(x)=0 -o<x<0 
=1/5 0cx«1 
= 3/5 lx 
z1 Зах< о 


Show that F(x) is a possible distribution function, 
the probability masses of the distribution, 

F(x) is monotonic non-decreasing everywhere, continuous on the right at every 
point and F(—oc)=0, F(co)=1. Hence it is a possible distribution function, In 


fact, F(x) is a step function, the step points being 0, 1, 3 which are the points of 
the spectrum, and 


and determine the spectrum and 


P(X -0) - F(0) - F(0—0) = 1/5 
P(X= 1) = F(1) -F(1-0) =3/5 -1/5-25 
P(X=3)= F(3)- F(3-0)-1— 3/5 2/5 


We shall now introduce some of the well-known discrete distribu- 
tions. 


5.6 IMPORTANT DISCRETE DISTRIBUTIONS 
(a) Causal distribution. 


The spectrum consists of a single point 
d, and Е 


P(X=a)=1 (5.6.1) 
а is a parameter of the causal distribution, 
we get different causal distributions. 


Example 1. If corresponding to every point of any event space 
we assign X=a, then X takes up the only value a, and hence X=4 


denotes the certain event so that Р(Х=а)=1. Therefore, X is causally 
distributed with Parameter a, 


i.e. for different values of 4 


5.6] IMPORTANT DISCRETE DISTRIBUTIONS 75 


(b) Binomial distribution. The spectrum consists of the n+1 
points 0, 1, 2...... п, i.e. 


ane (5.6.2) 
n) oy ani 
&-(1) #@—й%^ 
where п, a positive integer and p (0 <р < 1) are two parameters of 
the binomial distribution. We note that fys satisfy the necessary 
condition (5.5.3) (cf. Sec. 4.4) 
The figures below are the distribution curve and the probability 
diagram of the binomial distribution for n —4, p=}. 


Fig. 3. Binomial Distribution Curve (n=4, p=4) 


04 Р 
03 
03 
901 
1 2 3 4 
: (n=4, p=4) 


Fig. 4. Binomial Probability Diagram 


76 PROBABILITY DISTRIBUTIONS [5.6 


Example 2. Ina Bernoullian sequence of trials with probability 
of success p, the number of successes X can take the values 0; 1; 25;. 
or 


Xi —(i-0,1,2,.n) 


and by the binomial law 


i 


АРОК) Por - (7) pq - gy 


This shows that X is binomially distributed with parameters п and p. 


(е) Poisson distribution. The spectrum is the set of all non- 
negative integers, i.e. 


xsi (1=0,1,2,...) 
pig A (5.6.3) 
tent 


where и (20) is the only parameter of the Poisson distribution. Note 
that (5.5.3) is satisfied (cf. Sec. 4.4), 


12 


5 * 8 12 6 20 24 


Fig.5. Poisson Probability Diagram (4210) 


Poisson distribution as a limit 
follows from the discussions on the P 


ing binomial distribution. It 
oisson approximation in Sec. 4.4 


5.6] IMPORTANT DISCRETE DISTRIBUTIONS 77 


that if we set р= u/n where и is a fixed positive number and make 
n — œ, we shall get in the limit 
x;-i G=0, 5.2159) 
and 
BEES 

Л =e # ii 
That is, we obtain the Poisson distribution as a limit of the binomial 
distribution. 


Examples 

3. The number of successess in a Bernoullian sequence of п trials 
with probability of success p, where п is very large and p very small 
such that и =пр is of moderate magnitude, has an approximate Poisson 
distribution having parameter и. 

4. There is, however, no reason to believe that the Poisson dis- 
tribution can occur only as an approximation to the binomial distribu- 
tion. It may also arise as itself in connection with various practical 
problems, and we cite the following example to illustrate this point. 

Porsson Process. A family of random variables X(/) which 
depends parametrically on time f is usually called a stochastic 
process. A particular example ofa stochastic process is the Poisson 
process, in which we are concerned with counting the number of 
changes (a general name) in a given interval of time, and which obeys 


the following two laws : 
1. The number of changes duting the interval (t, t+ A) is indepen- 
occurred in (0, ў for all t and 


dent of the number of changes already 
h (20). 

2. The probability of exactly о 
where à is given positive constant, ап 
in the same interval is o(/). 


ne change in (f, t+/) is ah +o(h)* 
d that of more than one change 


otes the number of changes 


If the random variable X() den à 
rove that X(f) has a Poisson 


during the interval (0, 0), then we shall p 
distribution. 


OIN 
* o(h) means à function of h such that o(h)/h > 0 as h > 0. 


78 PROBABILITY DISTRIBUTIONS (5.6 


Proof. “Clearly X(t) can take the values 0, 1, 2,..... , and we write 
for convenience 


Pi X(t) = ike Р) (i=0, 1, 2,...) 

Consider two successive intervals (0, г) and (t, t+h) (h > 0). 
Let E denote the random experiment of counting the number of 
changes in (0, г) and Æ’ that in (t, t+ h) ; E and £' together form the 
compound experiment E", which then consists in counting the number 


of changes in (0, £-- А). We shall interpret the first law in the sense 
that the random experiments Е and £’ are independent. 


Now the event X(t--h)-i (2-1) connected with E" may be 


written as the sum of the following three pairwise mutually exclusive 
events : 


(а) i changes in E, i.e. X(t)=i and no change in £’ 
(b) i-1 changes in E, i.e. X(t) - i - 1 and one change in £’ 


(c) more than one change in Е’ such that the total number of 
changes in E" is i. 


Since Е and £’ are independent, we have by (4.2.6) and the second 
law 


P (t+ b) =P Dfl- h+ o(h)} + Pi- (ho o(h)} + o(h) 
or 
P (t+ h) - PQ) = -AAP (0) + 2hP,. a(t) + o(h) 
Dividing by h and passing to limit as h — 0 we have 
P= -IPO+2P.Q Gil) (Q) 
Now consider the case j=0. The event X(t+h)=0 can result in 


only one way, viz. no change in Е, i.e. X(t)=0 as well as no change 
in E', 


Hence 


Polt +h)=P (DU - ha o(h)} 
which gives 


P) = -1P() 00 
We assume the obvious initial conditions 
Р,(0)=1 (iii) 


PO=0 (і>1) (iv) 


5.6] IMPORTANT DISCRETE DISTRIBUTIONS 


Solving (i/i) with the initial condition (iii) we get 


PAA «e^ 
Set 
P(A =е- О,(ї) @=0; 1. 2...) 
From (v) 
0.0) =1 


and (iv) gives 
Q(0-0 (>1) 
Substituting (vi) in (j) we get the simple equation 
0:00) =2 Q;-.(0) 
We have О, (0) = 0,(7) = à so that using (viii) 
à 0.0) = 
Then Q,'(r) 2 3Q,(r) 2 2?t. On integration 
9,() - V2 


Gn? 


Again О, (0) =20.(0) 5. Hence 
NON 
0300) -31- 
etc. In general 
o -€? 
By (vi) 
Pt) e ex GY 
or 


Pix(t) = i} =e un 


which shows that X(r) is Poisson distributed with parameter 27. 


79 


0) 


(vi) " 


(vii) 


(viii) 


(ix) 


(i= 0, 1, 2,...) 


The above laws for the Poisson process are plausibly satisfied in 
many cases of practical importance. For example, the number of 
telephone calls on a trunkline in a given interval of time, the number 


80 PROBABILITY DISTRIBUTIONS | [5.7 


of car accidents on a particular road in a given interval, the number 
of suicides in a particular locality in a given interval, the number of 
radioactive atoms disintegrating in a given sample of a radioactive 
. substance in a given interval, the number of cosmic ray particles 
counted by a Geiger-Mueller counter in a given interval etc. may all 
be taken to be Poisson processes. 


It will be proved in Chapter 7 that the parameter и of the Poisson 
distribution represents what is called the average value or the mean 
value of the random variable. Here the average value is meant in 
the stochastic or probabilistic sense. Thus for a Poisson process 
the average number of changes іп a given time interval (0,7) is At, 
so that the constant 4 may be interpreted as the average number of 
changes per unit time. Anticipating this result, let us solve the 
following example. 


Example5, A radioactive source emits on the average 2.5 particles per second. 


Calculate the probability that 2 or more particles will be emitted in an interval 
of 4 seconds. 


Неге \=2.5 so that the number of particles emitted in an interval of 4 
seconds is Poisson distributed with parameter 4-10. 
probability.is 

P(X> 2)=1-P(X < 2)=1- P(X=0)-P(X=1) 
-1-e-!?-10e-1?21—11e79 


Hence the rcquired 


Remark. In the Poisson process / may also denote any parameter 
other than time. 
5.7] CONTINUOUS DISTRIBUTIONS 


The distribution of a random variable X is said to be continuous if 
the distribution function F(x) is continuous and its derivative FC) 
is piecewise continuous everywhere, which means that F(x) can have 
jump discontinuities at some points such that there are at most 4 
finite number of them in any finite interval. 


1. Since F(x) is continuous at any point a, by (5.3.5) 
P(X =a) = F(a) -F(a-0)=0 


b 
2. By (5.3.2) Pa < X < b)= F(b) - Fa- | FG)dx or 


5.7] IMPORTANT CONTINUOUS DISTRIBUTIONS 81 


Ра « X « - | feo (5.7.1) 


where 

Јо)= F'Q) (5.7.2) 
The function /(x) is called the probability density function of the 
random variable X. 


3. 15 (5.7.1) making a — — œ and replacing b by x, we have 


FG)- |до) ах (5.7.3) 
4, Since F(x) is monotonic non-decreasing, it follows that 
f(x) > 0 everywhere (5.7.4) 
and by (5.3.4) 
| HS ded (5.7.5) 


(5.7.4) and (5.7.5) are necessary restrictions for a possible density 
function. i 
5. Indifferential notation we write 
P(x < X < x+ dx)=F(x+ dx)- F()edF(X) 
= F'(x)dx = f(x)dx (5.7.6) 


This differential is called the probability differential or probability 
element of X or of the corresponding distribution. 


Density curve. The curve y= Six) is called the density curve 
which gives a useful graphical representation in the continuous case. 


Examples 
4. Find the value of the constant K such that 


fo)-Kx(ü-x)  O«x«1 
=0 elsewhere 


is a probability density function. Construct the distribution function and com- 


pute P(X » à). 
6 


82 PROBABILITY DISTRIBUTIONS [5.8 
By (5.7.5) 
1 
12 f fx)dx=K [xd -x)de=K/6 or K=6 
1 


The distribution function F(x) is given by (5.7.3). In —ee«x«0, F(x) =0; 
inO<x<l А 
F(x)=6 f x(1 -x)dx o (3-25) 
0 


andin 1 < x «oo 
F(x)=6 | x(1-x)dx=1 
POCH eT foods [xa =x)dx=} 


The last result may also be obtained by using F(x) as follows : 
P(X>4)=1-P(X<4)=1-F(4) =} 
2. Show that ` 
F(x)=0 =æ <х<0 » 
=I-e* O¢x<oo 
is a possible distribution function, and find the density function, 


Since F(x) is monotonic non-decreasing and continuous everywhere, F( —o9) 


=0 and F(o)=1, F(x) is a possible distribution function. By (5.7.2) we get on 
differentiation 


Лх) =e* О<х< о 


=0 elsewhere 


5.8 IMPORTANT CONTINUOUS DISTRIBUTIONS 

(a) Rectangular or uniform distribution. As the name uniform 
distribution implies, the density function in this case is constant ina 
given interval and vanishes outside it, i.e. 


Јо) =0 -9«x«a 
“55 a<x<b (5.8.1) 
=0 b<x< œ 


a, b ( > a) are the two parameters of the distribution. The necessary 
condition (5.7.5) is obviously satisfied. The density curve is shown in 


5.8] IMPORTANT CONTINUOUS DISTRIBUTIONS 83 


| 

| : А 

| Fig. 6, from the shape of which the other name rectangular distribu- 
| tion is derived. 
| 


The distribution function may be easily calculated by (5.7.3) which 


| gives 

| F(x)=0 -o<x<a 

| 
- #5 а<х<Ь (5.8.2) 
=1 Ь<х< œ 


The distribution curve is shown in Fig. 7. 


1 
1 
cU ME —— 
b-a 
a b a b 
Fig. 6 Fig. 7 
Rectangular Density Curve Rectangular Distribution Curve 


Example 1. A point X is chosen at random in the interval 
a<x<b insuch a way that the probability that it lies in any 
sub-interval is proportional to the length of the sub-interval. 
Then we can show that А is uniformly distributed over the 
interval (a, b). 

Proof. Let us construct the distribution function F(x). From 
the conditions of the problem 


F(x)=0 -=<х<а 
=0х-а) a<x<b, ji being a constant 
=1 b<x<o 


Since F(b+0)=F(), 2= 1/(b- а). Hence the result. 


84 PROBABILITY DISTRIBUTIONS [5.8 


Another method. We may also find the density function f(x) by 
‘making use of the probability differential. We have 


f(x) ах=3 ах a<x<b 
=0 elsewhere 
Ву (5.7.5) 4=1/(6-a), which proves the result. 


Remark. Often, for brevity, we shall use the phrase a point is 
chosen at random in a given interval to mean that the probability of 
its occurrence in any sub-interval is proportional to the length of the 
sub-interval, i.e. the random point has a uniform distribution in the 
given interval. 


(b) Normal distribution. This is defined by 


1 -(c-m)* ( 8 3) 
202 5.8. 

xX) = — о «S ien ш 

Дх) ё E 

т, о ( > б) being the parameters of the normal distribution. We 


Shall say that the distribution or the random-variable is normal (m, о). 
We have 


со 
„(т-та 


| ә&-—1__[, = dés 1 [= dx 


2. 2 Jn 
ME etd | xax 10-1 
mJ NEA мл 
Hence the condition (5.7.5) is fulfilled. Ву (5.7.3) 
1 f ei (5.8.4) 
F — ИИ 202 gu 
. o=- dx 


The particular case given by m=0, c —1 is very important and 18 
called the standard normal distribution. The standard normal density 
and distribution functions will be denoted by the special symbols «0 
and q(x) respectively which are given by 


= 


4x) = E e” о) | е ? ах (5.8.5) 


= 


5.8] IMPORTANT CONTINUOUS DISTRIBUTIONS 85 


The standard normal density and distribution curves are shown below 


0:5 
-3 -2 -1 0 1 2 3 
Fig. 8. Standard Normal Density Curve 
1| 
ES —9 EI 0 H 2 3 


Fig.9. Standard Normal Distribution Curve 


The normal distribution is of great importance in the theory of 
probability, and we shall have occasion to study this distribution in 


detail in the course of our theory. 


(c) Cauchy distribution 
I@= lou n a edt (5.8.6) 
4 ( > 0) апа и being parameters. It тау be easily verified that this 
density function satisfies (5.7.5). By (5.7.3) 
z 
24А (63 4 
Е(х) = Al + (х= а)? 


-œ 


or 
Е(х)= 1 tan (254) ++ (5.8.7) 


86 PROBABILITY DISTRIBUTIONS [5.8 


(d) Gamma distribution. The spectrum of the gamma distribu- 
tion is the positive half of the real axis, and the density function is 
given by 

ieee 
79 = т бичем (5.8.8) 
-0 elsewhere 


1 (> 0) being the only parameter. 


We know ro-| e-*x'-1dx, and so 


0 


| foods додак 1 
-00 0 


The random variable in question is often conveniently referred to as 
а }(1) variate. 
(e) Beta distribution of the first kind. The spectrum in this 
case consists of the interval (0, 1) and 
CLE) ? 
JO)=— BE т) dpt Я (5.8.9) 


=0 elsewhere 


The parameters are I(>0),m(> 0), and the random variable is 
called a g,(/, m) variate. Now 


9o 1 
1 -1 (m-i В(1 m) 
| Kods- sao] x - х) dx= grs"! 


Ha 0 
(t) Beta distribution of the second kind 
xt 
fü-xtwyresm "9*9 GMP) (вл) 
=0 elsewhere 
The random variable will be called a f,(/, m) variate. Verify (5.7.5) 
remembering that we may also write 
_( хах 
BC, m) -{ (Суху 


0 


5.9] TRANSFORMATION OF RANDOM VARIABLES 87 


5.9 TRANSFORMATION OF RANDOM VARIABLES 

Let y — g(x) be a given function of x and X a random variable defined 
onan event space S. Then Y= g(X) is also a random variable defined 
on S, for corresponding to every event point of S we have a value of 
X, and for this value of X we get a value of Y=g(X). Our problem 
in this section will be, given the distribution of X, to find that of Y. 
It can be proved in general that if g(x) is any continuous function, the 
distribution of Y is uniquely determined by that of X. The proof of 
this theorem is rather difficult and beyond the scope of this book. 
We shall instead consider the following important particular cases. 


Continuous case. Let y=g(x) be a continuously differentiable 


function which is strictly monotonic, i.e. either 2 <0 or ay < 0 


everywhere, so that the inverse function x= g~*(y) exists uniquely. 


Case I. dy >0. Since y=g(x) is strictly monotonic increasing, 


the inverse function is also strictly monotonic increasing, and hence 
X < x will imply and will be implied by g(X) < g(x) or Y < у, i.e. 
the events X < x and Y < y are identical. So P(X < x)=P(Y < у) or 
Е (х) =Е,(у) 
Taking differentials 
dF ,(x)=dF(y)=dF (say) 
or 
4 adx 
dF =f,(v)dy = fa) dx — fa) dy dy 
Hence 
d. 
HA) =e) уу (5.9.1) 


Case П. 9 <0, ie. g(x) is a strictly monotonic decreasing 


function. In this case 
(X < х) = 180) > se) (Y > у) 


Непсе 
j Р(Х x) - P(Y z y)=1-P(Y < y) 


or 
F,x)=1- Fy) 


88 PROBABILITY DISTRIBUTIONS [5.9 


Therefore 
dF (x)= -dF,{y) 
or 
` dx 
Jody = - }.0)йх= -fa(x) dy dy 
so that 


dx 
ЛО) = - Ff) E (5.9.2) 
The formulae (5.9.1) and (5.9.2) may be put together as 
Idx! 
400 =) ду (5.9.3) 


Since in either case a unique inverse function x=g~*(y) exists, the 
R. Н. S. of (5.9.3) may be expressed as a single-valued function of y. 


Discrete case. The discrete case is much simpler. Неге the 
spectrum only changes, the corresponding probability masses remaining 
the same. 


Let y=g(x) be a continuous and strictly monotonic function so 
that a unique inverse function x=g~*(y) exists. Set 
ж=8(х) (5.9.4) 
Since the transformation has a unique inverse 
(X 7x) -ig(X) <8} = (Y=) 


Hence the spectrum of Y consists of the points y, given by (5.9.4), 
and P(X-x)-P(Y-y) or 
fusi (5.9.5) 
Examples 
1. The random variable X is normal(m, c). Find the distribu- 
tion of Y - aX + b where a, b are constants. 
y-b, 


А а 
Set y=ax+b. Hence des а constant ; х=. 


5.9] TRANSFORMATION OF RANDOM VARIABLES 89 


As x ranges from — œ to œ, y also ranges over the same interval. 
Here 


(т-т\з 
"RE odo o 
Fx) = do. idy = lal 
By (5.9.3) 
-(e-m) 


1 
40) = ias * 


(иат)? 
1 Pa?0» 


St 
Joxlale (-=<у< =) 


Hence Y is normal (am b, |а|в). 
QNS : Х-т. 1 
In particular, the random variable = ds standard normal. 


2. If X is a 84(1, m) variate, then show that Ye 1/X is а В.(т, I) 
variate. 


Proof. For the running variable, set у= 1/x. Then 
d 1 
dx а es «0 


l-1 y2 
ае = fade - 60 |10 = aq my ex) 7 


“Bem, Ney 
O<y< =) 


which proves the theorem. 


3. If X isa standard normal variate, then prove that Y= iX? is 
а y(4) variate. 


Proof. Set y=}x*. Then dy =x, which shows that yis nota 


strictly monotonic function everywhere. Moreover, as x ranges from 
— о to co, y ranges only from 0 to = traversing the interval twice in 
opposite directions. This presents some difficulties, and the formula 
(5.9.3) is not at once applicable. We may, however, solve the 


problem by the following special artifice. 


90 PROBABILITY DISTRIBUTIONS [5.10 


Letx>0. The event 
Q < Y< yt+dy)={x® < X? < (x+dx)*} 
=(—х-4х<Х<-х)+(х< Х<х+ах) 
From symmetry 
P(-x-dx < X<-x)=P(x< X<x+dx) 


so that 

Р(у< Y< y+dy)=2P(x < Х < х+ах) 
or 

Жду = 245) 274) F ду 
Hence 


-i 
AQ)=2 Gover? La  @<у<=) 
That is, Y is a у(%) variate. 
4. If X is a binomial (п, p) variate, find the distribution of the 
linear function Y=aX+b. 
We know 
җ=ї (i=0, 1, 2,...n) 
By (5.9.4) the spectrum of Y is given by 
»=ай+Ь (i=0, 1, 2,...n) 
and by (5.9.5) 


== ( : ) р((1-р)*—* 


5.10 EXERCISES 


1. If F(x) denotes the distribution function of a random variable X, then 
show that 
P(a < X < b)=F(b-0)-F(a) 
Р(а< X < b)= F(b)- F(a-0) 
2. Let F(x) be a distribution function. Prove that, for any fixed h =< 0, the 
function 
eth 
60) 4, | F(t) dt 
z-h 
is also a distribution function. 


5.10] EXERCISES 91 


3. The spectrum of the random variable X consists of the points 1, 2,...... 
and P(X=i) is proportional to 1/i(i+1). Determine the distribution function of 
X. Compute P(3 < X < n) and P(X > 5). 


4. The distribution function F(x) of a variate X is defined as follows : 
F(x)=A -=<х<-1 


=B -1¢x <0 
=C 0<х<2 
=D 2£x«o 


where A, B, C, D are constants. Determine the values of А, B, C, D, it being 
given that P(X 20) - 1/6 and P(X > 1)=2/3. 
5. A number is chosen at random from each of the two sets 0, 1, 2, 3 and 


0,1,2,3. Find the probability distribution of the random variable denoting the 
sum of the numbers chosen. 


6. Five balls are drawn from an urn containing 4 white and 6 black balls. 
Find the probability distribution of the number of white balls drawn when the 
balls are drawn (a) with replacements, (6) without replacements. 

7. In Banach's match-box problem (Ex. 5 Sec. 4.4) find the distribution of 
the number of matches left in one of the boxes when the other box is just found 
empty. 

8. Find the probability distribution of the number of failures preceding the 
first success in an infinite sequence of Bernoulli trials with probability of success p. 


9. If X is a binomial (n, p) variate, then show that 


а 1 
Р(Х < к)=\ x”-+-1(1 —x)tdx /\ x”-+-1(1 —x)*dx 
0 0 


where q=1-p and k is an integer such that O£ k«n-— 1. 


10. If X is Poisson distributed with parameter и, then prove that 
ES 


m | e7*x"dx 
n 


P(Xen-z 


where 7 is any positive integer. 


11. If there isa war every 15 years on the average, find the probability that 
there will be no war in 25 years, 


42. There are 500 misprints in a book of 500 pages. What is the probability 
that a given page will contain at most 3 misprints ? 


43. 100 litres of water are supposed to be polluted with 10° bacteria, Find 
the probability that a sample of 1 c.c. of the same water is free from bacteria. 


92 PROBABILITY DISTRIBUTIONS [5.10 


14. Show that a function which is |x| in (—1, 1) and zero elsewhere is a 
possible probability density function, and find the corresponding distribution 
function. 

15. Show that a function f(x) given by 

fx)x О<х<1 
=Ё-х 1<х<2 
=0 elsewhere 
is a probability density function for a suitable value of the constant k. Calculate 
the probability that the random variable lies between 1/2 and 3/2. 


16. The probability density function of a random vavriable X is A sech x. 
Find the value of the constant A and compute P(X < 1) and Р(| Х| > 1). 


17. There concentric circles of radii 1/ J3, 1 and J3 feet are drawn on a 
target board. If a shot falls within the innermost circle 3 points are scored : 
if it falls within the next two rings the score is respectively 2 and 1 and the 
score is 0 if the shot is outside the outermost circle. If the probability density of 
the distance of the hit from the centre of the target is 

2 4 
т Ї+г* 
find the probability distribution of the score. 


18. A point X is chosen at random on a line segment AB whose middle point 
isO. Find the probability that АХ, BX and AO can form the sides of a trianglo. 


19. A point P is chosen at random on a circle of radius a and А is a fixed 
point on the circle, Show that the probability that the chord AP will exceed the 
length of the side of an equilatera! triangle inscribed in the circle is 1/3. 


20. A point P is taken at random on a line segment AB of length 2a. Find 
the probability that the area of the rectangle AP.PB will exceed 4a?. 


21. A point chosen at random in a given interval divides it into two sub- 
intervals. Find the probability that the ratio of the length of the left sub- 
interval to that of the right sub-interval is less than a constant К. 


22, If X is a normal (m, c) variate, prove that 
P(a < X < b)-& (=”)- 2 e") 


and 
P(|X —m| > az) 2201 -4(a)] 
where (x) denotes the standard normal distribution function, 


23. If X is uniformly distributed іп the interval ( — 1, 1), find the distribution 
of |X|. 


5.10] EXERCISES 93 


24. A point is chosen at random ona semi-circle having centre at the origin 
and radius unity and projected on the diameter. Prove that the distance of the 
point of projection from the centre has probability density 


1 
=й for -l<x<1 


and zero elsewhere. 


25. A straight line is drawn through a fixed point (A, #)(A > 0) making an 
making an angle XY, which is chosen at random in the interval (0, =), with the 
y-axis. Prove that the intercept on the j-axis, Y has a Cauchy distribution with 


parameters ^, к. 

26. The probability density of a random variable X is 2xe-*" for x > 0 and 
Zero otherwise, Find the probability density for X’. 

27. If X is normal (0, 1), find the distribution of ех, 

28. If X is a 4(/) variate, find the density function for NX. 

29. If X isa s,(J, m) variate, then show that X/(1—.X) is a В,(1, m) variate. 


30. Find the distribution of the square of a Poisson-x variate. 


CHAPTER 6 


TWO-DIMENSIONAL DISTRIBUTIONS 


6.1 DISTRIBUTION FUNCTION IN TWO DIMENSIONS 


Let X and Y be two random variables defined on the same event 
space S. The joint distribution function Fz, y (x, y) or simply F(x, y) 
of X and Y, or the distribution function of the two-dimensional random 
variable (X, Y) is defined by 


Е(х, уу=Р(-=<Х<х,-=<ү<у) (6.1.1) 
where the event (-» < Хх, – ә <Y<y) means the joint 
occurrence of the two events - e < X х and - о < Y< y, ie. 

(-e«Xzx-e«Yzy-(-e«Xzx(-»«Yxzxy 
Properties of F(x, y) | 
1. Leta « b, с<а. We have 
(-о< Х < а, – ә < у<0о+(0<Х<,- ә < Y<o) 
=(-7< X<b,-7 <Y<0) 
and the events on the L.H.S. are mutually exclusive. So 
F(a, с)+Р(а < X <b, -œ< Y < c)=F(b, с) 
or 
F(b, с)- F(a, )=P(a< X <b, - о < Ү< с) (6.1.2) 
Since the R. H. S. of (6.1.2) is non-negative 
F(b, c) < F(a, c) 
Similarly, it follows that 
F(ad)-F(ac)-P(--«Xzac«yzd) (619 
whence 
F(a, d) 7 F(a, c) 


P "MN e 
These show that F(x, y) is monotonic non-decreasing in poth th 
variables x and y. 


6.1] DISTRIBUTION FUNCTION IN TWO DIMENSIONS 95 


2. Consider the half-open rec- 
tangular region : 


a<x<bc<y<d 
of the xy-plane. Clearly 


F(b, d) + F(a, c) - F(a, d) - F(b, с) 
=P(a < X < b,c < Y <d) (6.1.4) 


3. In (6.1.1) making x >- œ 


and y — — = we get respectively Fig. 10 
F(- œ, y)=0, F(x,- ©) =0 (6.1.5) 
Also making both x and y —« we have 
Е(о, о) = 1 (6.1.6) 


4, From (6.1.2) and (6.1.3) it follows that 
F(a*0,c)-F(a c, F(a, c+ 0) = F(a, с) (6.1.7) 
5. From (6.1.4) we get the following : 
F(b, d) + F(b-0, c) - F(b - 0, d) - F(b, c) 
=P(X=b,c<Y <d) (6.1.8) 
F(b, d) + F(a, d- 0) - F(a, d)- F(a, d — 0) 
=P(a < X < b, Y=4) (6.1.9) 
F(b, d) + F(b —0, d-0)-F(b-0, d) - F(b, d-0) 
=P(X=b, Y=d) (6.1.10) 


Marginal distributions 
Given the joint distribution function F(x, y) of X and Y we may 
easily calculate the individual distribution functions F(x) and F,(y) 
of X and y respectively as follows. 
In (6.1.1) we make y >, and note that th 
is the certain event S. Then 
(-=<Х<х,-=<у<=)=(-=<Х& 
Hence К(х,»)=Р(-=<Х<хХ) Ог 
Е„(х) = FG 9) (6.1.11) 


e event - oc Y <% 


xS-(-e«Xz 


Similarly 
Fy) = Е(, у) (6.1.12) 


96 TWO-DIMENSIONAL DISTRIBUTIONS [ 6.1 


The individual distributions of X and Y thus calculated are often 
called the marginal distributions. 


Remark. The world marginal is obviously superfluous and 
perhaps serves the only purpose of hinting at the fact that they have 
been calculated from a joint distribution. 


Independent random variables 
Tf the events — e < X < x and – œ< Y < y are independent for 
all x, y, then 
P(-ecXzx-ecYzy-FK-ec«Xzxlf(c-e«c Y<y) 
or 
„Fœ, y) = Fi) Fu) (6.1.13) 
We shall take (6.1.13) as the definition of independence of the random 
variables X and Y. 
A simpler equivalent criterion for the independence of X and Y is 
contained in the following theorem. : 
Theorem I. A necessary and sufficient condition for the indepen- 
dence of the random variables X and Y is that their joint distribution 
function F(x, y) can be written as the product of a function of x alone 
and a function of y alone. 
Proof. The condition is obviously necessary. To prove its 
sufficiency, we note that if 
F(x, y)=a@)hQ) 
then by (6.1.6) 
8(=)Л(=)=1 
Write 


atx) 0) 
FG Э= (=) Ko) 


By (6.1.11) 
Е„(х) = FG, 0) 5) 


Similarly 
lh 
F,0)- A 


Hence F(x, y)= F(x) F. ,Q)- which proves the result. 


= 


$21 DISCRETE DISTRIBUTIONS 97 


Theorem II. If X and Y are independent, then 
Р(а < х<Ь,с<у<4)=Р(а< Х<БуР(с<ү< d) (6.1.14) 
Proof. By (6.1.4) 


Р(а < X <b, c < Y< d)=F(b, d)+ F(a, c) - F(a, d) - F(b, c) 
If X and Y are independent, we have using (6.1.13) 


R.H.S. = F(b)F(@) ax Е.(а)Е,(с) e? F.(a)F,(d)- F.(b)F.(c) 
-4F.(5) - Е„(а)ҢЕ„(4) – Р„(с)} 
=Pa<X<b)Pe<Y<4) 

Theorem HII. If X and Y are independent, 
P(X-b, Y=d)=P(X=b) P(Y=d) (6.1.15) 
Proof. This follows immediately from (6.1.14) by making 
à — b, c > d, both from the left. 


6.2 DISCRETE DISTRIBUTIONS 


The distribution of the two-dimensional random variable (X, Y) will 
be called discrete if the distribution function F(x, y) isa step function 
in two dimensions having steps of heights f;; (> 0) at the points 
(хь у) (i, j=0, +1, +2...), ie. 


i i 
F(x, y)= > > fap for x LX < Хз, yi y < Уна 
В=-оо а=-оо 


(i, j=0, +1, +2,...) (6.2.1) 
This function will satisfy all the necessary conditions for a 


distribution function if 
E = Ad (6.2.2) 


j=-œ і=-20 
The details of the distribution may now be obtained as follows : 
1. If (b, 4) is not a step point, by (6.1.10) P(X=b, Y -d)-0, 
but at a step point (х;, y;) 
Р(Х=хь Y=y,) = F(x; yj + F(x;-0, yi- 0) 
-F(x;- 0, ya- F(X yi -0) 
or 


Р(Хехь Ү=у)=/з (6.2.3) 


98 TWO-DIMENSIONAL DISTRIBUTIONS [ 6.2 


That is, point probability masses f;; are situated at the points 
(x; yi) (i, 7=0, +1, H2). 
2. By (6.1.4) 
Pa<X<be<vy<d- > > f (6.2.4) 
e<ycda<aucb 
3. The (marginal) distribution of X is given Ьу: 
For x; < x < хил 


FG) = Еб, nS b ме rA 


Bz-o» a=-æ а=-ос 


where the symbol 


foe >, fis (6.2.5) 
ј=-оо 
This shows that F,(x) is а step function having steps of height /;. at 
X, (@=0, +1, +2,......) Hence it follows immediately from the 
theory of one-dimensional discrete distribution that x,’s are indeed 
the points of the spectrum of X, and Р(Х = xj) = f;. or 
Ја= №. (6.2.6) 
Thus if we sum all the probability masses on the line х= х; for 
different value of i, we get the marginal distribution of X. 


Similarly, y;'s denote the points of the spectrum of Y, and 


Ju 7 FY-7y)7f.i (6.2.7) 
where \ 
f= > fü (6.2.8) 


4, The criterion of independence of the random variables X and 
Y, (6.1.13) reduces, in the discrete case, to the following : 
Л =Р Је for all i, j (6.2.9) 
The necessity of the condition (6.2.9) follows at once from 
Theorem III Sec. 6.1. 


^ A 


6.2] DISCRETE DISTRIBUTIONS 99 
If (6.2.9) holds, then from (6.2.1) we get : 


For xj. «& X Xuas VHS y Уа 


rœ- > р, fan 


B=-œ а=-оо 


-> SS ale a) [E fu) 


Be -cc q=—00 а=-оо. 


or 
F(x, y) = F(x) F) 


which holds for all x, y. Hence the condition is also sufficient. 


Examples 

1. An urn contains 4 white balls numbered 0,1,2,3, 3 red balls 
numbered 0,1,2, and 2 black balls numbered 0,1. The random 
experiment consists in drawing a ball at random from the urn, and 
two random variables Х, Y are defined as follows: X takes values 
0,1, and 2 respectively for white, red and black balls, and Y denotes 
the number of the ball. Find the joint distribution of X and Y, and 
deduce therefrom the marginal distributions of X, Y. 


The individual spectra of X and Y are given by 


x;-i (i 40, 1,2) 

y;=j (ї=0,12,3) 
Hence the spectrum of the two- 
dimensional random variable (X, Y) 
is 

(Xi ys) =(i, J) 
(i=0,1,2 ; j= 0,1,2,3) 

and fy=P(X=i, Y=j)=} for 
all i, j, except fis, Jess fas Which 
are all zero. This gives the 
joint distribution of X and Y. 


100 TWO-DIMENSIONAL DISTRIBUTIONS [6.2 


Now 
8 
fa =f. = 20 №; 
і=0 


= а=, fas =F 


2 
fu-fa- >, fü 
{=0 


fuo= $= Ла, 2, fy =% 


These marginal distributions of X and y may be easily verified to be 
correct by direct computation. 


The random variables X and Y are not independent, as the 
condition (6.2.9) is not fulfilled. 


2. Let X and Y be two Poisson variates having parameters их 
and и» respectively. We have 


i 
HE (-0,2,.); famen 


эе] (701,2...) aeh = 
Then (x; y) =(1, j) (i, /=0,1,2,...) gives the spectrum of (Х, Y) 
Now if X and Y are independent, by (6.2.9) 


"P 
Sig eina) Ша" ug? 
3 2171 

ilj! 


Conversely, if the above expression for fi; represents a joint 
distribution of X and Y, then we can prove that X and Y are 
independent. We have 


i j i 
=o 7(#т+из) Mi Be" = o~n UI 
fece ПГ at 
Similarly 


j 
fes ET 


Therefore f;; =fif.; for all i, j. Hence the proof. 


6.3 ] CONTINUOUS DISTRIBUTIONS 101 


6.3. CONTINUOUS DISTRIBUTIONS 
The joint distribution of the two random variables X and Y is defined 
to be continuous if their joint distribution function F(x, y) is conti- 
nuous everywhere and its first and second order partial derivatives 
are piecewise continuous everywhere, ie. continuous in the whole 
xy-plane except that there may be a finite number of curves of 
jump discontinuity in any bounded region. 

1. Since F(x, y) is continuous everywhere, by * (6.1.10) the 
probability mass at any point (5, d), 

P(X-b,y-d)-0 
2. We note that 


[| 


а 


а 
= а 
NE 2) Фуу) - (5) n У 


= Е (b, d) - F(b, c) - F(a, D+ F(a, с) 
By (6.1.4) 


Ра<Х<Ь,с<ү<4)= 


Pla 


b 
|7 (х, у) ахау (6.3.1) 


where 
dF 
Л, ) 7 5x5y (6.3.2) 


f(x, у) is naturally called the joint probability density fnnction of 
X and Y. 

If, instead of a rectangular region, we consider any region R of 
the xy-plane, then 


Р(Х, Y)eRi- | | JŒ, 3) ахду (6.3.3) 
R 
Y 2 
8. F(x, »-| | Дх, у) dxdy (6.3.4) 


4, Since F(x, y) is monotonic in both the variables, we must 


have 
Дх, yy >0 for all х,у (6.3.5) 


102 TWO-DIMENSIONAL DISTRIBUTIONS [6.3 


and since (2, о)=1 


| | Дх, у) dxdy-1 (6.3.6) 


—ю ~= 


So f(x, y) must satisfy the conditions (6.3.5) and (6.3.6) in order to 
be a possible density function in two dimensions, 


5. The probability differential 
Px<X<xidx, у<ү< y*dy) 
=F(x+dx, y+dy)+F(x, y)- F(x+dx, y) - F(x, y+dy) 
=dF(x, у) 
= ZF dxdy- fs, y) dxdy (6.3.7) 
ӧхду 


6. The (marginal) distributions of X and y are given by 


со 


Е) Fes, =)= | | ло, 3) dx dy 
Hence Ev" 
ло) Fs) [76s y)dy (6.3.8) 
Similarly B 
ЛО)= ES y)dx (6.3.9) 


=o 


Another method. We may also easily obtain (6.3.8) and (6.3.9) by 
using the method of differentials. We have 


/(Х)4х= Р(х < X < х+ах)= | SC y) dxdy =dx | Лх, y) dy 
Я y-—99 —= 
Therefore 


L= | f6 9) dy 


—со 


6.3] CONTINUOUS DISTRIBUTIONS 103 


Remark. The method of differentials, although possibly not very 
rigorous, will be sometimes found useful in our theory. In fact, 
it readily helps building our intuitive picture of how the probability 
mass is spread over the xy-plane. 


7. A necessary and sufficient condition for the independence 
of X and Y, in the continuous case, is 


fes, =) fy) ={ [r vay} { ES э dx) (6310) 


If X and У are independent, 


F(x, )  F.Q) FQ) 
So 
ag, 7 Fe) Fy) 
or 
Дх, у) е0) ЛО) 
If (6.3.10) holds, then integrating we have 


| | I(x, y) dx 4у={ Í f(x) dx w | fy) dy } 


which gives (6.1.13). Hence the result. 


Another method. The necessity of the condition (6.3.10) may also 
be proved by the method of differentials. By Theorem II Sec. 6.1, 
if X and Y are independent 

Px«-Xzx-dxycYz y+dy) 
=P(x < X < х+ах) Р(у < Y<y+dy) 
or 
Дх, y) dx dy =felx) dx fy O) dy 

or 


Дх, у) =/ О) ЛО» 


Example. The joint probability density function of two random 
variables X and Y is K(1—-x- y) inside the triangle formed by the 
axes and the line x+y=1 and zero elsewhere. Find the value of 


104 TWO-DIMENSIONAL DISTRIBUTIONS [6.4 


K and calculate P(X <4, Y>4). Find also the marginal distributions 
of X, Y, and determine whether the random variables are inde- 
pendent or not. 


By question 


Дх, y)=K(L-x-y) for x>0, y>0,x+y<1 
=0 elsewhere 
By (6.3.6) 


ty 


1 
i-K| | 072-5 асау | a »vay- ik 
0 0 0 


so that K=6, Then 


c 


à 
P(X «3, Y»9-6| (1-x-y) dx dy 
0 


ЕЕ 


=3 | (@— x)*dx = 13/32 
From (6.3.8) i 
1-2 
узб | (1-x-ydy-30-3* @<x<1) 
Similarly 
1-у 


%0)=6 | a-x-9ax-sa-»» Quy) 
0 


Since f(x, y) # f(x) 0), X, Y are dependent. 


6.4 IMPORTANT TWO-DIMENSIONAL OR BIVARIATE 
CONTINUOUS DISTRIBUTIONS 


(a) Rectangular or uniform distribution. The density function 
is given by 
1 s 
fe = вао) ima<x<bc<y<d 


ша (6.4.1) 


elsewhere 


6.4] TWO-DIMENSIONAL CONTINUOUS DISTRIBUTIONS 105 


The condition (6.3.6) is satisfied. There are four parameters of this 
distribution, viz. а, b, c, d(b>a,d>c). 

It is easily seen that X and Y are independent having rectangular 
distributions in one dimension with parameters (a, b) and (c, d) 
respectively. 

Conversely, if X and Y are independent and uniformly distributed 
over the intervals (a, b) and (c, d) respectively, then the two-dimen- 
sional random variable (X, Y) is uniformly distributed over the 
rectangle a < x «b, c « y « d. 

We may also have a uniform distribution in any region R of 
the xy-plane, for which 


T(x, y= 1 within “R 


=0 outside R 
where R denotes the area of the region R as well. 
If R’ is any subregion of R, then by (6.3.3) 


I 
P(X, Y) € К}= | | fes y) dx dye Ё (6.4.2) 
= 
Let us now solve some interesting problems by the application 
of formula (6.4.2). 


Examples 

1. BurroN's NEEDLE PROBLEM. A vertical board is ruled with 
horizontal parallel lines at constant distance b apart. A needle of 
length а (< b) is thrown at random on the board. Find the pro- 
bability that it will intersect one of the lines. 

Let the random variable X denote the inclination of the needle to - 
the horizontal and the random variable Y the perpendicular distance 


106 TWO-DIMENSIONAL DISTRIBUTIONS [6.4 


of the higher end of the needle from the ruling just below it. From 
the conditions of the experiment we may reasonably assume X and Y 
to be uniformly distributed, the former over the interval (0, х) and the 
latter over (0, b), and that X and Y are independent. Then the two- 
dimensional variate (X, Y) is uniformly distributed over the region 
R:0<x<a2,0<y<b, 


Now the event that the needle intersects one of the lines can be 
represented by the inequalities 0 — Y < a sin X or that (X, Y) lies in 
the region К':0 < у <аѕіп х. Now 


т 
R-ab, к-а sin x dx — 2a 
З 0 


Ву (6.4.2) the required probability is 2a/xb. 


2 Two points are independently chosen at random in the 
interval (-1, 1) Find the 
probability that the three parts 
into which the interval is divided 
can form the sides of a triangle. 


Let the two points be re- 
presented by the random variables 
X and Y which are independent, 
each having a uniform distribution 
in (-1, 1). 


Fig. 14 


If Y — X, the required event is represented by the following three 
inequalities : 


X+1+Y-X>1-Y or Y>0 


4 


Y-X+1-Y>X+1 or X«0 
X+1+1-Y>Y-X or ү<х+1 


ie. (X, Y) lies in the triangular region R, bounded by the axes and 
the line y=x +1. 


6.4] TWO-DIMENSIONAL CONTINUOUS DISTRIBUTIONS 107 


Similarly, when X > Y, (X, Y) lies in the triangular region Ra 
| bounded by the axes and y=x-1. Since 


the required probability = Ri $5 - 1 : 


3. X and Y are independent variates, each uniformly distributed 
over the interval (0, 1). Find the probability that the greater of X, У. 
is less than a fixed number k (0 — k — 1). 

The two-dimensional random variable (X, Y) is uniformly distri- 
buted over the unit square R:0 < x < 1,0<y<1. The required 
event is max (X, Y) < К. 

If Y > X, max (X, Y) = Y and the required event means Y < k, or 
in other words, (X, Y) lies in the triangular region R,: x > 0, 
y«kyox. 

Similarly, in case X > Y, the required event is equivalent to the 
fact that (X, Y) lies in the triangular region А: x « k, y>0, 
y «x. 

Now R, and R, together form the square A': 0< x< k, 
0<y<k. Hence R-1, R'-k? and the required probability is 
R'IR-k?. 


(b) Bivariate normal distribution. Here 
1 (e 2 afte fy-my eu } 


1 -a= оз 7,6, 
Яне us 7 > 
2дбабу /1—р 
(- =<х< =,— о <у< ә) (6.4.3) 
where m, my o2(> 0), оу (> 0) and p(-1 <р < 1) are the five 
parameters of the distribution. 
The (marginal) distribution of X is given by 
fale) =| fo, 9) ay 


E 


oo 1 [emt gm fum) a) 


"P. 1 3 е 21-р) 0,3 940, oy" ay 
2noxcy /1-p 8 


108 TWO-DIMENSIONAL DISTRIBUTIONS [6.4 


Setting x' =*— and y 9 Py we get 
Oz Cy 


1 a 
ЕЗ M 3ü—pa А 
fiios = e dy 
eg 1 j -pri 
a eng dy 
NT A 2n уір“ ay 
EL. 
A20; 
1 со lu-ma 
because the latter term‘of the product is of the form —— fe ae 
Е „210 ау 
and hence has value 1. Thus 
_(z—m,)? 
20,3 


9 „2л, E 


This shows that X is normal (тї, oz). Similarly, Y is normal (my, cy)» 
It is interesting to note that the individual distributions of X and Y 


are independent of the parameter p. It is also follows that 


j je. y) dxdy = [Aoi 


‘Consider the family of ellipses given by 
(x - mz)? -2p (х= т.) Qr- my) , Q- a (6.4.4) 


Cu" бхбу 


4 being the parameter of the family. On апу one of these ellipses the 
density function is constant, and hence these are called equi probability 


ellipses. 
On the ellipse 4 


Аз 
791—ра) 


IG, y- 240.94 CA 257 


6.5] CONDITIONAL DISTRIBUTIONS 109 


The area of this ellipse 


T _ wh oso 


"ORE М1—р® 
2 


А= А(2) = 


ne os ең шк! 
lioo "ey? Nos dy 


so that 


dA = 29203 gy 
I= p* 


which represents the elementary area of the strip between the ellipses 
4 and ż+d}. Hence the probability that (x, Y) lies in this strip 


de 
— i е MP), 2zÀoszoy di 
2ло2су A1 — p? Jl-p? - 
da 
2\1—p2) 
"s di 
p 
Therefore, the probability that (X, Y) lies within the ellipse 7 
T се HELF 
—pTZj4e 0 = а e 
ж: 
0 
Hence the probability that (X, Y) lies outside the ellipse 4 is 
V 
“aip 


6.5 CONDITIONAL DISTRIBUTIONS 
Discrete case. By definition the conditional probability 


P(X-x&Y-y), fi 
Pes Yo y = Оре FEY) = " 


Denoting the L. Н. S. by fj; we have 


NU 6.5.1 
pr n (6.5.1) 

We note Ди; > 0 and 
PEL (6.5.2) 


Hence, for a fixed value of /, fis may denote the probability 


110 TWO-DIMENSIONAL DISTRIBUTIONS [6.5 


masses of a one-dimensional discrete distribution which is called 
the conditional distribution of X on the hypothesis Y=y;. Thus the 
conditional distribution of X on the hypothesis Y=y; is obtained 
by dividing every probability mass on the line у= y; by the total 
mass on that line. 

Similarly, the conditional distribution of Y on the hypothesis 
X=x;, is defined by 


sd 
fic (6.5.3) 


so that 


= Ўн=1 (6.5.4) 


j2-9 
We may write (6.5.1) and (6.5.3) in the form of a multiplication 
rule : 
ffe Sin” fui fin (6.5.5) 
Comparing (6.5.5) with (6.2.9) we see that the condition of inde- 
pendence is equivalent to 
fischer or fu (6.5.6) 
any one implying the other. The form (6.5.6) of the condition of 
independence of X and Y certainly has a ready intuitive appeal. 


Example 1. Find the conditional distritributions in Ex. 1 Sec. 6.2. 
The conditional distribution of X on the hypothesis Y= уо=0 is 
given by 


fafta So forts fw, faci ete 


Continuous case. We have 
P(a« X «b, y « Y X y* Ay) 


Pa<X<bly<V<ytAy)= Р(у<Ү<у+ Лу) 
y+ Ду b 
] Г, y) dx dy 
=" A 


J ГО») dy 


6.5] CONDITIONAL DISTRIBUTIONS 111 


We assume that f(x, y) and Л.О) are continuous in their respective 
ranges of integration, so that using the mean-value theorem we get 
b 
Ay f Л, 11) dx 
Ay fy (12) 

(у < n=), Na X yc Ay) 

b 

S Дх, т) dx 

a 


Jo Qe) 


Pa<X<bly<y<y+tAy)= 


Now making Ay — 0 and writing 
lim Pa<X<bly<y<y+ Ay-P(acxzb|Y-y (6.5.7) 


Ay20 
we have 
" f f(x, у) dx 
(a « X « | rapa = 


this step being justified by virtue of the above assumption of 
continuity of f(x, y) and f,(y). 


Setting 
fly) -A (6.5.8) 


P(a < X<b|Y=y)= f fax] у) dx (6.5.9) 


Hence f,(x| у) z 0 and integrating (6.5.8) we get, for any fixed у 
oc 


| де) deel (6.5.10) 
These show that, for any fixed y, /,(х 1 y) behaves like the density 
function of a one-dimensional distribution and is called the condi- 
tional density of X on the hypothesis Y — y. : 


The conditional distribution function Ё„(х | у) is given by 


Ах) Р e < Х<х|у=ў= | ле) — (51) 


112 TWO-DIMENSIONAL DISTRIBUTIONS [6.5 


Similarly, we define the conditional density function of Y on the 
hypothesis Х= х by 


AOI LED (6.5.12) 
Then 
( лот) dy=1 (6.5.13) 
and А : 
коі) | A013 dy (6.5.14) 
Combining (6.5.8) and (6.5.12) we have 
fx, 3) f 69 ЛК) 9,0) 7610) (6.5.15) 
The criterion of independence (6.3.10) reduces to 
ух =) or fol» =f) (6.5.16) 


where any one implies the other. 


Remark. The conditional probability P(a < X < b|Y-y)is as 
such meaningless, since the hypothesis Y=y, for a continuous 
distribution, is а stochastically impossible event. We have, however, 
avoided this difficulty by defining it as a limit by (6.5.7). 


Another method. In terms of the differentials 


р(х < Х&х+ах|у < Ү < у+ау) 
LP Х &х+йх, у < У у + ау)\ 
PO < Y < y+ dY. a 


_ f(x, y) dx dy _ j 
STONE fl») dx 


у 
whence f,(x|y) may be interpreted as the conditional density function 


of X on the hypothesis Y = y. 


Examples | 
9. Find the conditional probability density function fol») m 
the example of Sec. 6.3 and compute P(X < $1Y— 1). 


6.6 ] TRANSFORMATION OF RANDOM VARIABLES 113 


By (6.5.8) 
y £65 y) 20 -x- y) Је 
Ж(х\у) = FO "^ (0-3 (0<x<1-y) 
where y is a fixed number such that 0 < yolk. Шу= 
f(xy) = 80-2) (0<х<3) 
so that by (6.5.9) 


à 
PX <t|y=4 - | (-x)dx=$ 
0 


3. Bivariate normal distribution. Here 


ГО) 1 тас {Р оњ)" 
Jelly) РА 0) A/2noz JI = 

(6.5.17) 
which shows that the conditional distribution of X on the hypothesis 


Y=y is also normal having parameters 
{ть + Pa (y-m,), сь J1— p p) 


Similarly, the йй distribution of Y on the hypothesis 
X=x is normal 


{ my +p 2 (x-mz), oy Ur 
= 


6.6 TRANSFORMATION OF RANDOM VARIABLES IN TWO 
DIMENSIONS 
We shall here discuss the continuous case only, and, for simplicity, use 
the method of differentials. Consider a transformation of pao 
(x, у) (и, v) 
given by 
u=u(x, у), vev(x, y) 
where u(x, y) and v(x, y) are continuously differentiable functions for 


WC Us 
which the Jacobian of the transformation, e is either > 0 or < 0 


throughout the xy-plane, so that the inverse transformation (и, v) (x, y) 


is uniquely given by х= (и, у), y = у(и, v). 


8 


114 TWO-DIMENSIONAL DISTRIBUTIONS [6.6 


Now given the joint density function of the two variates X and Y, 
we shall find that of the variates U=u(X, Y) and V=wX, Y). 


We note that under the above transformation the joint probability 
differential remains unaltered, i.e. 


dF=P(x < X < x+dx,y < Y < y+dy) 
=P(u < U < u+du, v < V < vt+dy) 
or 


dudy 


fus 9) dud f. Gs, у) dedy m fe 05 9) $099 


giving 


жу Y= fo. м 9) [FER (6.6.1) 


the R.H.S. being expressed as a function of u, v. 


Let us now prove an important theorem. 


Theorem I. Letu-u(x) and v=v(y) be continuously differentiable 
and strictly monotonic functions of their respective arguments. If the 
random variables X and Y are independent, then so are the random 
variables U=u(X) and V=v(Y). 


Proof. By (5.9.3) the distributions of U and V are given by 


[ 


f.) 5, LOY = KB 


Let us find the joint distribution of U and V. The Jacobian of the 
transformation (х, у) > (и, v) reduces to du, p which is either < 0 or 
< 0 everywhere, as the functions u(x) and v(y) are strictly monotonic. 


If X and У are independent, we have by (6.6.1) 


ль DSD GY =f) ЛО) 


This shows that U and V are independent. 


6.6] TRANSFORMATION OF RANDOM VARIABLES 115 


Example 1. If the joint distribution of X and Y is the general 
: bivariate normal distribution, and 


g-X-"s y. 1 (= т) 


Ox J1-p? 


су Ox 
then U and V are independent standard normal variates. 


Proof. Set 


xom: у 1 р = ту об - ma) 


су Ox 


u = 
* 08 J1- p? 


9 (u, v) 1 37 
Then ы ЖЫ. So ES 
8 (x, 9) ono, JT ap? a positive constant. By (6.6.1) 


1 [тъ ыш my ual 


1 ~ 3-9?) 3 
Fut gs eae ere "s °=б, Py А 
2лбхбу /1—p *ozoyN ip? 
а = ] е-и. Р LE e? 
-n Кре N^ 


which proves the result. 


Distribution of a function of X, Y. Let w=u(x,y) be a given 
continuously differentiable function. To find the distribution of the 
variate U=u(X, Y) we proceed as follows. We assume further that 
the function и is such that there exists another function v=v(x, у) 


where the Jacobian > Э is either > ог < 0 everywhere. First, we 


find the joint distribution of U and the variate V —v(X, Y) which is 
given by (6.6.1), and then from this joint distribution we calculate the 
marginal distribution of U by (6.3.8). Hence 


f wf Лаб Y) EX n | (6.6.2) 


where the integrand is expressed as a function of и, v. 


Remark. The assumption of the existence of another function 
v=v(x, y) described above corresponds, in one dimension, to the 
condition that the function y = g(x) is strictly monotonic (cf. Sec. 5.9). 


116 TWO-DIMENSIONAL DISTRIBUTIONS [ 6.6 


Example 2. The joint density function of the random varables 
X, Ү is given by 
fx у)=х+у O<x<1dO<y<l 
=0 elsewhere 


Find the distribution of XY. 
Set U=XY, V=X; u=xy, vex so that x=v, y= E 


and oY -x. As x, y range from 0 to 1, u, v also range from 


0 #01. By (6.6.2) 


= m 
f) - fos, dv=] i sfo, 4) ® 
Now flv, 5) 0+5 for0<v<1,0<" «1, ie. for u<v<l 


(0 <и < 1) and zero otherwise. Hence 
1 
л) (1) 20-0 @<и<1) 
w 
Theorem Ii. If X and Y are independent continuous variates, 
then the density function of U =X + Y is given by 


fale) = |0) ли) dv (6.6.3) 


Proof. Setting u=x+y, v=x, 


З 9 (и, у) 
хе», у=и-у; z i=- 

4 ДО | 
Since X and Y are independent, 


Јо Y) =F) fO) =F.) Риу) 
(6.6.3) then follows at once from (6.6.2). 


Examples 

3. If X and Y are two independent normal variates (m,, cz) and 
(my, oy) respectively, then U=X+Y is a normal variate (m, o) where 
mem, my, с? = о + су“. 


| 6.6] TRANSFORMATION OF RANDOM VARIABLES 117 


Proof. By (6.6.3) 


со 


1 7 Mo — mo, 3-- (u= v— т.) 0, 
= u)= 4 LZ i j 2} 
fo 1 |е Pmt dy 
о 
Now 
| (v=m,)* | (и-›-т„)* 
CONO ap 
| Ox Oy 
v= Ma)? c(u-m-v—m4* 
Ox" PEJ 
„б-т? Xu - m) (v- т.) , c? (y mj 
су PE da oya z, 
a y out 2 (u-m)? 
= 33V- Mz- “5 (u- МЕЦА 
On oy" \ "We (и s) + с? 
Then 
(ш- т)? æ s? f ГРЫ а 
1 йй c cs omm, - Eum] 
feo IM ien fe gg 2 dy 
J2n6 J2acx6y 
oe 
= l gu-m)? Qo? 
/Э2ло 
Hence the proof. 


We have thus proved a very important property of the normal 
distribution, viz. that the sum of two independent normal variates is 
again a normal variate. This is called the reproductive property of the 
normal distribution. The reproductive property is, however, not typical 
of the normal distribution alone but is also possessed by other distribu- 
tions like the binomial, Possion, gamma etc. Let us now prove the 
reproductive properties of the binomial and gamma distributions. 

4. If X and Y are independent »-variates with parameters / 
and m respectively, then (a) X+Y is a ; +m) variate and (b) X/Y 
is a g,(/, m) variate. 

Proof. We can prove both (a) and (4) simultaneously if we put 

U-X*Y, V- XIY ; u=x +y, voxly 
uy u, Quy, xt» 
IR Jug 009) БОЛ, 
As x, y range from 0 to œ, и, v also range from 0 to œ. 


x= 


118 TWO-DIMENSIONAL DISTRIBUTIONS [6.6 


Since X and Y are independent, the joint probability differential 
open TOUT р exi 1ym-i 100, y) 

TO rm I() Tim) (и, у) 
ет" 1-1 pre enw ub*m-i yii 


TiO ro» (е) ©“ "r0 re) dan 
Noting that T(/)F(m) = B(l m) T(/ +m) we may write 


dF- dudv 


dxdy- 


du dv 


ent yltm-2 yii 
I(4m) dug. т)(1 + у)" dy 
which shows that U and V are independent and also proves the 
results (a) and (b). 
5. If X and Y are independent binomial variates (7,,p) and 


(na, p) respectively, then their sum U= X+ Y is a binomial (n, ns, p) 
variate. 


dF- 


Proof. The spectrum of U is given by 
ик=К (Же, Ws, Эжей» s Na +na) 


Then 


fu - P(U =u;)=P(U =k) = >, xsi, Y-j) 


i+j=k 


= o hn [since X and Y are independent 


itj=k 


Р 21) (7 Jo (1 -ptt 


itj=k 


ШАРАШ 


itj=k 
or 


fuk = ( n, TA | p = p)”, 
Hence the result. 
n n 
[To show > ( i | ( j )-( mee consider the identity 
i+j=k 


(L4 x): = (1 x) х)". 


6.7] EXTENSIONS TO MANY DIMENSIONS 119 


Then 


mima SS ens 
> (i) P > 2 у (ye) yi 
k ij 
k=0 j=0 1=0 


Equating the coefficients of x* from both sides, the above result 
follows. ] 


6.1 EXTENSIONS TO MANY DIMENSIONS. MUTUAL INDE- 
PENDENCE 

Let us first take the case of three variate X, Y, Z. The joint 

distribution function of X, Y and Z or the distribution function 

_ of the three-dimensional variate (X, Y, Z) will be defined by 


F(x, y =P- e< XL - P< VS) = <Z<2) (67.1) 
The (marginal) distribution of the two-dimensional variate (X, Y) 
is given by 
Fa, (х, 3) ЕФ, у» 9) (6.7.2) 
and therefore ` 
Fa (x)= Fa, 5 œ)=F(x, =, œ) (6.7.3) 
and so on. 
The variates (X, Y) and Z are said to be independent if 
F(x, y; 2) =Е,, Kez y) F(z) 
y, 2) denotes the joint density 
tained as follows. 


(6.7.4) 


For the continuous case, if fe 
of x, Y, Z, the marginal density functions are o 


co 


p „(Х› y) = | Дх, » 2) dz 


-oo 


(6.7.5) 


oo 


fu | 5360 05 


—оо 


| | Дх, у, 2) dydz (6.7.6) 


n of independence (6.7.4) becomes equivalent to 


I(x; У, 2) = v5 » fe (2) (6.7.7) 


etc, The conditio 


120 TWO-DIMENSIONAL DISTRIBUTIONS [6.7 


Theorem I. If the variates (X, У) and Z are independent and 
if u—u(x, y) and w(z) are continuous functions of their arguments, 
then the variates U=u(X, Y) and W —w(Z) are also independent. 


rook: Let us prove the theorem for the continuous case 

and, for the sake. of this proof, assume further that u=u(x, у) 

and .w —w(z) are continuously differentiable, and the function u is 

such that there exists another function v= (x, y) which makes the 

Jacobian m) > or < 0 throughout the xy-plane, and d > or <0 
(х, y) dz 


everywhere. The proof will be exactly similar to that of Theorem I 
Sec, 6.6. We have 


le 2 
fat ofa vs ) BERD, flv) fice) [d] 
‚ olu, v, д(и, у) 4 
Since е» н) -йе а” > or <0 everywhere, the extension of 
(6.6.1) for three variates gives 


Pu, v, (и, V, w) 7 fe, v, 5 у, z) ч 2A | 


By (6.7.7) 


z у 100 0) | |4! 
Fons ult v w= fas 9) ЛО Dee de 
or ` 
Pi v, (и, у, м) — fi, оби, у) fol) 


Integrating with respect to у from – « to c we get 
Js w(u, w)- u(t) Лб») 
Therefore U and W are independent. 


A new concept of independence necessarily arises for more than 
two variates, 


viz. that of mutual independence. The vaiiates X,Y,Z 
are defined to be mutually independent if 


F(x, У, 2)=F,(x) F) F(z) (6.7.8) 


Theorem II. If x, Y, 


Z are mutually independent, then (a X 
and Y are independent anı : Н ш 


d (b) (X, Y) and Z are independent. 


6.7] EXTENSIONS TO MANY DIMENSIONS 121 


Proof. Fe, „(х y) e FG, y, 9) =F.) FiO) FC) 
= F(x) FO) 
and so 
F(x, у, 2) = Fs, у(х, y) Fz) 

These prove (a) and (b). 

The generalisation of the definition of mutual independence for п 
variates is obvious. The л variates X,, Xss... X, will be called 
mutually independent if their joint distribution function 


F(x, Xe eere X) m Fai (2) Tora Gr) EE Fan (Xn) (6.7.9) 

For continuous variates, the condition (6.7.9) reduces to 
Fs, х,......эм) SSe fe 1). Fe On) (6.7.10) 
where the L.H.S. denotes the joint density function of X,, Xas- Xi 


Theorem Ш. If X,, Xa».....- X, are mutually independent, then 

(a) any group of т(< п) of these variates are mutually 
independent, 

(b) the variates (Х.,...... Xy s UR as P OREEO CREER Xan) 
where 1<k,<ka<...... <Кь<п, are mutually independent, and 

(c) the variates LOO EET Xr) BsXi ao een Xy)» tees 
Sieg aio. ee Xa) where, g's denote continuous functions of their ' 
arguments, are mutually independent. 

Proof. Similar to the proofs of Theorems I and П. 


Extensions of the reproductive properties 


With the help of Theorem III we may n d 
Teproductive property of any distribution to the case ofn variates. 
For the normal distribution, in particular, we have the following 


general result. 


ow easily. extend the 


y independent normal variates having 
(iy on) respectively, then their sum 
в) variate where 


If X,, X.,......X, are mutuall 
Parameters (ті, сі), (тз, ci 
Mt Xt... + Ху is a normal (m, 


2 2 2 
memdmyt a nter Sg да Hoton 


122 TWO-DIMENSIONAL DISTRIBUTIONS [6.7 


Proof. If Xi, X.,...X, are mutually independent, by Theorem 
(a) X, and X, are independent, and so by the reproductive property 
for two variates X, + Xa is normal (m, ms, Joi? +021). 

Then by Theorem Ш(с) X, +X and X, are independent," and 
hence X, + Xa+ X, in normal (m, ma ms, Jo? + оз +082): 

Repeating this argnment, we arrive at the general result. 


Example. Ifthree points are independently chosen at random on 
a circle, find the probability that they will lie on a semi-circle. 

Let P, Q, R be the random points chosen on the circle. Suppose 
O is the centre of the circle and A a fixed point on it. Let the angles 
made by OP, OQ, OR with the fixed direction OA be X, Y Z 
respectively. By question X, Y, Z are independent random variables, 
each uniformly distributed over (0, 2z), so that the three-dimensional 
random variable (X, Y, Z) has a uniform distribution over the cube : 


0 <x < 2л,0 < у < 22,0 < z < 25, i.e. the joint density function 
is given by 


1 
foy 2) = ъ 0<х<2л,0< у 2л, 0<2<2л 


=0 е]зезуһеге 
Consider the case X < Y < Z. The event in question may be 


Q 


Q 


Fig. 15 


expressed by the following three inequalities : 
Z-X«mZ-YonY-X-n 
or (X, Y, Z) lies respectively in the regions 
O<x<2n, zaparta yecz«xtn 
OFS x, p< 7, Ytan <z < 2л 
Üccxcam inZ ў<.2% «£X 


БЕ 
8] EXERCISES 123 


Now 
Эп wtr хіт 4 T str ot 
1 1 +7 
aM. | fos. ade dy demgs| | | eoe 
= v 0 т 
1 92r т 2r 
+» | || | dx dy dz 
т = y 
EE! 
82*8 56-102 
T T ат ; т 9л 
| | | fe ааа | | | 4хауа: 
= у+}+т 0 х у+т 
"s 
48 


т "Әт 97 


fos, у, 2) dx dy dee gla | | | жауа: 
v 


0 zT 


2—4 
— 
ec} 


-— 
48 
Ee ane that there are 3!=6 cases such as X < Y < Z and making 
of symmetry, the required probability is 


(sits 


6.8. EXERCISES 


of es A ball is drawn from an urn containing 
hich the first 4 are white, the next 3 red and the last 2 black. If the colours 


whit 
е, red and black are reckoned as colour number 0, 1 and 2 respectively, 


find =“ : 
the joint distribution of the random variables—number of the ball and 
istributions of the individual 


the 
p v number. Calculate the marginal dis 
om variables and the conditional distribution of the number of the ball 


on а 
the assumption that the colour is white. 
d and black balls, 


9 balls numbered 0, 1, 2...8, 


there being 4 balls of each 
eriment consists in drawing a ball 
d numbers—0 for white, 1 for red 
he ball and that of the colour 


ы Ап urn contains 12 white, ге 

Goo Bede 0, 1, 2, 3. The random exp 

and 2 is urn. If the colours are also assigne 

PA нд black, then show that the number of t 
lependent random variables. 


104 TWO-DIMENSIONAL DISTRIBUTIONS [6.8 


3. А сага is drawn from a full pack of 52 cards. If X denotes the number 
on the card (assuming that the jack, queen and king respectively correspond to 
the numbers 11, 12 and 13) and Y takes values 1, 2, 3, 4 for spade, heart, 
diamond and club respectively, find the distribution of the two-dimensional variate 
(X; Y), and show that X and Y are independent. 


4. Find the joint distribution of two independent variates, one of which 
is Poisson distributed with parameter д and the other binomially distributed 
with parameters (л, p). 


5. Show that the function f(x, у) defined by 
f(x, y)=sinxsiny O<x<}r, O<y<}r 
=0 elsewhere 


-is a possible two-dimensional probability density function. Find the marginal 
density functions, and prove that the random variables are independent. 


6. Determine the value of the constant K which makes 
Кх, у)= Кху (O<x<1, 0<у<х) 
a joint probability density function. Calculate the marginal density functions 
and show that the variates are dependent. 
7. The joint probability density function of two variates X, Y is given by 
Кх, y)=(6-x-y)/8 O<x<2, 2<y<4 
=0 elsewhere 
Calculate the following probabilities : 
P(X«1,Y«3, P(X-Y«3) 
P(X«1|Y-3, P(X<1|Y <3) 
8. If f(x, y)=3x? -8xy + 6? (0<x<1, O<y<1), find f,(x|y) апа fi(y|x), 
and show that X and Ү are dependent. 
9. The joint density function of the random variables X, Y is given by: 
Kx, у)=2 (0«x«1, 0<у<у) 
Find the marginal and conditional density functions. Compute Р( < X < 8|Y= р. 
10. Let X, Y be two random variables, éach having spectrum (— со, со). 
If the conditional density function of X on the hypothesis у= у is |y|e7"?47] Jr 
and the density function of Y is \e-\*«?/ Jr, prove that the density function 
of X is Afr (х? +). 
11. Two points are independently chosen at random in the interval (0, 1). 
Find the probability that the distance between them is less than a fixed 
number k (0<k<1). 


12. Two numbers are independently chosen at random between 0 and 1. 
Show that the probability that their product is less than a constant k (0 <k <1) 
is k(1-log k). 


6.8 ] EXERCISES 125 


13. If p and g are independent variates each uniformly distributed over the 
interval (—1, 1), find the probability that the equation x?+2px+q=0 has 
real roots. 2 

14, А dart is thrown at random оп а square target board having vertices (1, 0), 
(0, 1), (-1, 0) and (0, —1), the point at which the dart hits the board being 
CX; Y). Find the marginal density functions of X and Y and show that they are 
dependent. 

15. A random point (X, Y) is uniformly distributed over a circular 
region: x?+y? <a’. Find the marginal distributions of Xand Y, and the con- 
ditional distribution of Y assuming that Xx (|x| < a). 

16. A straight line AB is divided by a point C into two parts AC and CB 
whose lengths are a and b respectively. If two points P and Q are independently 
chosen at random on AC and CB respectively, find the probability that AP, PQ, 
OB can form the sides of a triangle. К 

17. A floor is paved with tiles, each tile being a parallelogram. If a stick of 
length c falls on the floor parallel to a diagonal whose length is /, then show 
that the probability that it will lie entirely on one tile is (1—c//))*. Show also 
that if the distances between pairs of opposite sides of a tile are a and b and a 
circle of diameter d falls on the floor, the probability that it will lie on one 
tile is (1—d/a)(1—d/b). 

18. Two points P, Q are independently chosen at random on a circle and A 
is a fixed point also on the circle. Find the probability that the three points 
A, P, Q will lie on the same semi-circle. 

19. In the bivariate normal distribution the equiprobability ellipse ^ for which 
the probability mass in the strip between the ellipses à and \+da is maximum 
(for fixed dy) is called the ellipse of maximum probability. Find the probability 
mass outside this ellipse of maximum probability. 

20. Let (X, Y) have the general two-dimensional normal distribution, and 
we make a linear transformation : 

U-(X-m,) cos а+(Ү-т,) sin а, V= -(X —m) sin a4 (Y-m,) cos a 
Show that U, V will be independent normal variates if 


20с:0; 
tan 2а= 2009». 
On" — Cy 


21. If f(x, y)=x+y (0<x<1,0<y<1) is the joint density function of the 
variates X and Y, find the distribution of X+ Y. 

22, If X and Y are independent variates both uniformly distributed over (0, 1), 
find the distributions of + Y, X- Y and XY. 

23. If the Cartesian co-ordinates of a random point are independent standard 
Normal variates, show that its polar co-ordinates are also independent variates, 
and find their distributions. 


126 TWO-DIMENSIONAL DISTRIBUTIONS [6.8 


24. If X,, X, are independent random variables each having the density 
function 2xe7 7? (0<х< æ), find the density function for the random variable 
AX HT XO. 

25. Let X,, X, be independent variates each having the density function 
ae-9* (0 <х < ос), where a is a positive constant. Find the density function 
for X,/X.. Prove that the variate X,|(X,-- X.) is uniformly distributed 
over (0, 1). 

26. If X, Y are independent random variables whose density functions are 
given by 

1 
Boe өе CÍ «x«1) f)e2ye" (0 « y <) 
prove that the random variable XY has a normal distribution, 

27. Prove that the sum of two independent Poisson variates having para- 
meters p, and и, is а Poisson variate having parameter м, +/+ 

28. If X and Y are independent Cauchy variates having parameters (\,, nu) 
and (As, #2) respectively, then show that X-- Y also has a Cauchy distribution 
having parameters (^, +А., 4, s). Hence deduce that if Х,, X,,...X, are n 
mutually independent Cauchy variates each with parameters (^, и), their arithmetic 
mean Ж=(Х,+ X,-----X.)n is a Cauchy variate with the same parameters 
Qs n. 

29, If the joint probability density function of the random variables X, Y, Z 
is given by 

fex y. z)- 6/(l--x-y-z)* (0«x«», 0«y«c, O<z<cx) 
find the distribution of the random variable X+ Y+ Z. 

30. If Xi, X;,..X, are mutually independent random variables each uni- 
formly distributed over (0, 1), prove that the density function of X2 X, X,...X, 
is 4 

1 1 nad 
771 [2 (5) | (0<х<1) 

Hence deduce the density function of the geometric mean (X, X,... X,)'!^. 

31. The numbers Х,, Ж„,...Х are independently chosen at random in the 
interval (a, b). Prove that the probability density funetion of the random 
variable Х= min (X,, Хз... Xn) is given by 


1092 527 


(a«x«b) 


CHAPTER7 
MATHEMATICAL EXPECTATIONS I 
7.1 MATHEMATICAL EXPECTATION OR MEAN VALUE 


It is sometimes worthwhile to reckon the salient features of a mass 
distribution in terms of a number of typical values. For example, if 
we know the centre of gravity and the moments of inertia about any 
three mutually perpendicular lines of a mass distribution in space, we 
Certainly get some rough idea regarding the distribution. In the 
theory of probability also, we shall be interested in obtaining a number 
of such typical values which will be called the characteristics of the 


distribution. For this, we start with the definition of what is called 
mathematical expectation or mean value. 


If g(x) is a continuous function, then we know that the distribution 
of the random variable 8(X) is completely determined by that of X, 
and we define the mathematical expectation or the mean value of the 
function g(X) of the random variable X, to be denoted by Fig(X)l, by 


Eig(x))- Dee) Л for a discrete distribution 


iz—coo 


C.1.1) 


- [seo f(x)dx for a continuous distribution 


to 
Provided the series or the infinite 
implies that if the series or integ 
convergent (even if it is converge 
say that the expectation does not 
constant associated with the distrib 


integral conyerges absolutely. This 
ral in question is not absolutely 
nt but not absolutely SO), we shall 
exist. We note that Eig(X)| is a 


ution of X and the function g(x). 
Some simple properties 


The following sim 
be easily verified. 
l. E(a)=a, a being a constant. 


ple properties of mathematical expectation may 


128 MATHEMATICAL EXPECTATIONS I [7.1 


Ejag(X)) = aE{g(X)} (а= constant). 
Ejg.(X)+82(X)+ -- 84 001l 
= Ejg. (X); + EJZ OO) + Ele CO 
4. |E{g(x)}i < EtlgC0 1 
5. If g(x) > 0 everywhere, then E{g(X)} 2 0. 


6. Ifg(x) > 0 everywhere and Ejg(X)}=0, then 2(Х) = 0, i.e. the 
random variable g(X) has а one-point distribution at х= 0. 


ә ore 


Remark. If we set Y=g(X), we get two definitions of the 
mathematical expectation of this random variable—the first to be 
calculated from the distribution of X which is represented by E feCx)r 
and the second E(Y) which is calculated from the distribution of Y. 
Now in order that our definition of mathematical expectation is 
unambiguous, these two numbers must be the same, i.e. E{g(X)}= E(Y). 
This result can indeed be proved to be true for any continuous 
function g(x). Let us here prove it for a continuous distribution 
under the simplifying assumption that g(x) is a continuously differen- 
tiable and strictly monotonic, say, increasing function. We have 


со 


E= [YLO 


— 


Putting y = g(x) and using (5.9.1) 


EG) - | 69/2695 ax 


= | 562/462 dx= А800) 


z 


Examples 


i. From an urn containing N, white and №. black balls 
(N=N,+N,), balls are successively drawn without replacement. 
Find the mathematical expectation of the number of black balls 
preceding the first white ball. 


71] MATHEMATICAL EXPECTATION OR MEAN VALUE 129 


Let X denote the number of black balls preceding the first white 
ball. Then the random variable X can take the values 0, 1, 2,...... 
Le. x; =? (20, 1, 2,...Ne), and the corresponding probability masses 


= Р(Х =) are given in Ex. 8 Sec. 5.2. Hence 


Na С 
IN NN, - 1) (Ng —i +1) 
Ex)- >" A(N -1) —QU-i) 


Since У) f; = 1, we get 


(М - 1) (Va P1). Na 
(Wa 11) CN. 


which is an identity in N,, N,. Replacing N, by N,+1 in this 
identity we get another identity : 


N 
NN, - 10) (№-1+1) Ne 
N(N-1)--(N-i*1 N,+I 
1 


Taking the difference of these two identities we have 
Na 
N, +1 
2. Ifr tickets are drawn successively with replacements from an 
urn containing л tickets numbered 1, 2,...... n, then find the expectation 
of the greatest number drawn. 


Е(Х)= 


The distribution of X, the greatest number drawn, is given in 
Ex, 7 Sec. 5.2. This gives 


E(X) - n^ Sili” - (i - 1 
pe E. (i - Dye _ (i “ 1] 


Sn- ay 
1 
If n is large 


n 1 

л n 
nt > (1—1) п [= dx=] 

1 0 й 


130 ı MATHEMATICAL EXPECTATIONS I [7.2 


so that 
nr 


E(X)= түт 


3. A point is chosen at random on a circle of radius a. Compute 
the mathematical expectation of its distance from a fixed point also 
on the circle. 

Let O be the centre of the circle, A the fixed point and P the 
random point on the circle. Let X denote the angle POA. By 
question, the random variable X is uniformly distributed over the 
interval (0, 2л), i.e. the probability density function is given by 

fG)-1/24 (0<х<2а) 
Now PA —2a sin 2 X, and its expectation is 
9T 
E(2a sin }X)= а | sin 4x dx =% 
0 


7.2 MEAN 


The mean of Xor that of the corresponding distribution is -naturally 
defined to be E(X) and is often denoted by the special symbol m(X) or 
m; or simply т, i.e. 
m=E(X) (7.2.1) 
The mean has an important physical significance, viz. it represents. 
the centre of mass of the probability mass distribution. The mean 
thus gives a rough position of the bulk of the distribution and, as such, 
is called a measure of location. The mean, however, is not the only 
measure of location ; there are other measures of location as well, 


some of which will be defined later. 
Examples 
1. BINOMIAL DISTRIBUTION 


n 


m- 2 (i) a= Si") apy 


t=1 
n 


h-i A МР n-1 
=np (КР t(1-p)* tem D 11) pa ps 
i=0 


i=1 


73] MEAN 131 


2. PorSSON DISTRIBUTION 


EI У æ 1 LE 
A RU... > ш" = `E 
= ul = et = == ye~ = 
um ie ii e 0-01 ue il ue "er = ц 
i-0 1-i i-0 


3. NORMAL DISTRIBUTION 


со 
1 
m(X) -—— [seme dx 
2 
E ac) 
" Ф 
= | (x-m)e-e-m*P?e dx 
AK 
Al mc] 
E 
m 
+—— | е-(=-т)2/20° dx 
EI 
м ло) 
со 
“= [seo ахъте0+тет 
N LTO 
-00 


се 
Moreover, the integral f xe-«*/2e? dx, and hence the integral represent- 
co 


ing m(X), is absolutely convergent. Therefore, the mean exists and 
‘is equal to m. Thus we see that the parameter m has the natural 
interpretation of being the mean of the distribution. 


4, CAUCHY DISTRIBUTION. This is an example in which the 
mean does not exist. 


4 x dx А х- u)dx af хах 

2 = 

ne A) ae A) Ge ъи л axe tH 
E Er -со 


Since the last integral is not absolutely convergent, the mean does not 
exist. (The integral is also not convergent, but its Cauchy principal 
value exists which is zero.) 


5. GAMMA DISTRIBUTION 
со 


4 T КЕ” nly | ec ae D 
"go ]** icq 5] x! dx= Td) =] 


132 MATHEMATICAL EXPECTATIONS I [7.3 


7.3 MOMENTS 


Let k be a non-negative integer. The moment of order k or the 
kth moment of X about a fixed point a is defined to be the mean value 
E(x- a)*. 


E||X—a|*] will be called the kth absolute moment of X about a. 
It is to be noted that the existence of the kth moment implies the 
existence of the kth absolute moment. Again if the kth moment 
exists, we can prove that the (k—1)th moment also exists. This 
follows easily from the theories of convergence of series and infinite 
integrals, in view of the inequality |x —a|*^* <|x-—a|*+1 for all x. 


Hence if the kth moment exists, all moments of order less than 
k exist. 


Thc kth moment about the origin, which is often simply called 


the kth moment of X or its distribution, will be denoted by a;(X) or 
azk OF ax, i.e 


ay = E(X*) (7.3.1) 


Clearly ao 1, a, = т, i.e the mean is the first moment of the distribu- 
tion. 


Of special interest аге the moments about the mean which are also 
called the central moments, The kth central moment uy(X) Or Hzr 
ог yy, is then given by 


i7 EX — ту\ (7.3.2) 
We have wo = 1, и, = 0 for all random variables. 


The central moments yj, may be expressed in terms of the ordinary 
moments of order < К as follows. 


(Х- т)" = > (-1) (Ах ті 
120 


which gives 


= y Dt Н ар m (7.3.3) 
i=0 


1.8] MOMENTS 133 
Noting а,=1 and a,=m, we get T 
из — ag — т? 
из = аз – Зат + 2m? (7.3.4) 


ш, 7 a, = asm + baam? — Зт* 


We can calculate the moments of a continuous function g(X) of 
X from the distribution of X as follows. 


Setting Y = g(X), (У) = ЕНЕ(Х)р] or 


а(х) = EL CO] (7.3.5) 
For the central moments, we note 
my E(Y) = Е\в(Х)} (7.3.6) 
Hence 
(Y) = EKY 7 m,)*} = Elíg(X) – ту 
or 


ик\в(Х)} = Elis (X) – mh") (7.3.7) 
where m, is given by (7.3.6). 
If Y-aX-*b, a, b being constants, m, = am; + b. So 
u(aX b) = E{(aX + b - am, — by = akE(X - m3 
or 


uxla X + Б) =a" и(Х) (7.3.8) 


Examples 


i. NORMAL DISTRIBUTION. Since the mean is m, 


Ma ge 
in pez | те -m 1262 dx 


-%0 


E 
= _1— | (х= т)“ (х-т) e e-m» 20° dx 
A EnO 


134 MATHEMATICAL EXPECTATIONS I [1.4 | 


= c* (x — m):-* g-t-mn /202 


чы 


=% 


+(Е—-1)5% | (т) ечи-м ге, dre} 


e] 


=(k- 1). X f (х- m7? ече-тт te dx 
or 
ик = (Ё— 1)o* uro 
Since u,-1, u, —0, it follows that 
Marys = 0, lar = 1.3.5......(2 – 1) 
2. GAMMA DISTRIBUTION 


оо 


= у |же" xi а= f е-= yltk-1 а= 0 
0 


=1(1+1)(1+2)...... (-k-1) 


7.4 VARIANCE 


The second central moment из is of great importance and is called 
the variance of X written as var(X), i.e. 
var(X) = us = E((X - m)*} (7.4.1) 
It is clear from the definition that the variance is a characteristic 
which describes how widely the probability masses are Spread about 
the mean or, in other words, an inverse measure of concentration of 
the probability masses about the mean. Such a measure will be 
called a measure of dispersion. The physical interpretation of the 
variance is that it represents the moment of inertia of the probability 
mass distribution about a line through the mean perpendicular to the 
line of the distribution. 


Now the variance is a quantity whose unit is square of the unit of 
the random variable, and it is sometimes more convenient to have a 
measure of dispersion having the same unit as that of the random 


44] VARIANCE 135 


variable. This is obtained by taking the positive square root of the 
variance which is called tlia standard deviation of X to be denoted 
by c(X) OF o; OF о, i.e. 


o= + Jvar(X) (7.4.2) 
1. Since (x-m)? > 0 for all x, уаг(Х)= 0 implies X=, i.e, 
the whole mass is concentrated at the mean. 


9. The second moment about any point is minimum when taken 
about the mean. 


Proof. (X-a)*={(X-m)+(m—- ay? 
=(X- т)? +2(m- а(х-т) +(т- a)? 


E\(X-a)*} = Е{Х-т)*}+2(т- a)E(X - m) + (m - a)* 
= ust (m-a)* 2 us 


3. The following formulae will be sometimes useful for evaluating 
the variance : 


с? = аз = т (7.4.3) 
съ = E{X(X- 1)) - mQn — 1) (7.4.4) 
Proof. (7.4.3) is nothing but the first equation of the set (7.3.4). 
For (7.4.4) we note 
(Х-т)* = Х(Х- 1)-2тХ+Х+ m? 
Hence 
c? = E(X(X - 1) 2тЕ(Х) +E(X) +m? 
-E(X(X-1)-2m* *m* m? = R.H.S. of (7.4.4) 
4. For k=2, (7.3.8) gives 
var(aX + b) =a*var(X) (7.4.5) 
Putting а = 0, it follows that the variance of a constant is zero. 
In terms of standard deviations (7.4.5) may be written in the 


form : 
o(aX + b) = |a|c(X) (7.4.6) 


| 


136 MATHEMATICAL EXPECTATIONS I [74 


Standardised or normalised random variable. Forany random 
variable X, if we set 


хе" (7.4.7) 


then m(X*) 20, o(X*)=1, апа X* is called the standardised or norma- 
lised random variable corresponding to X. The standardised random 
variable, we note, is dimensionless, and as such all its moments are 
also so, and since its mean is zero, its central moments coincide with 
the corresponding ordinary moments. 

It follows that the standardised variable corresponding to a linear 
function aX +b is X*, which reduces simply to X if a>0. 


Examples 
1. BINOMIAL DISTRIBUTION. In this case the formula (7.4.4) 
will be convenient, and we have 
n 


Eixac- 10) = D ii- (7) ra -py 


ї=0 


= п(п - 1)p? > (г 3) p'-*(1 - р)" 2 n(n - 1)p? 


t=2 
From Ex. 1 Sec. 7.2 m np, so that 
aê e n(n – 1) p° — np(np — 1) 2 np(1 - p) 
2. Poisson DISTRIBUTION 


oo 


Eix(x -1)}= > i(i —1) et E meh у =p? 


i=0 


Hence 
c? =u? — ulu- Dem o= Ju 
3. NORMAL DISTRIBUTION. In Ex. 1 Sec. 7.3 we deduced the 
general formula for gax from which u;—5?. Hence the parameter o 
of the normal distribution denotes nothing but its standard deviation. 


4. GAMMA DISTRIBUTION. By Ex. 2 Sec, 7.3 аз — (1+ 1). There- 
fore by (7.4.3) 


s? = 1+ 1) - 12-1, ge Jl 


7.8] THIRD CENTRAL MOMENT 137 


5. CAucHy DISTRIBUTION. Since the mean of this distribution 
is non-existent, the variance necessarily does not exist. We note 
further that E((X — u)?], the second moment of X about p, is infinitely 
large. 


7.5 THIRD CENTRAL MOMENT 


It is sometimes necessary to study the degree of lack of symmetry 
of a distribution, and the third central moment provides a measure for 
this asymmetry or skewness. We note that, for a symmetrical distribu- 
tion, the point about which the distribution is symmetrical naturally 
becomes the mean, and all central moments of odd order are zero, 
ie. ио 70 (ko 0, 1, 2...). The first central moment ,,, however, 
vanishes for all distributions. Hence we may take из to be a measure 
of skewness. Now it is desirable to have a dimensionless measure 
describing such a property as asymmetry or skewness. This is ob- 
tained by considering the third moment of the standardised variate 
X*, viz us/c?, and we define the co ficient of skewness уу by 

„= (7.5.1) 
When y, >0 the density curve for a continuous variate may be roughly 
described as having a longer ‘tail’ on the positive or the right side 
than on the negative or the left side, and conversely for negative 
skewness. The figure below (Fig. 16) illustrates three density curves 
having negative, positive and zero skewnesses. 


(a) ~~ (b) 


Fig. 16. (а) Negative Skewness (b) Positive Skewness (c) Symmetrical 


138 MATHEMATICAL EXPECTATIONS I [7.6 


Remark. If, for an unsymmetrical distribution, из happens to be 
zero, then we may take u;/c" or, speaking generally, the first non- 
vanishing odd order moment of the standardised variate as a measure 
of skewness. 


Example. GAMMA DISTRIBUTION. From Ex. 2 Sec. 7.3 
а= 11+ 1), а = 1+ 1)(1+ 2) 
Ву (7.3.4) 
ив=21, у= 211% =2/ JI 


7.6 FOURTH CENTRAL MOMENT 


The fourth central moment и; or rather the dimensionless quantity 
lalot, which is the fourth moment of the standardised variate, is some- 
times used to measure the degree of peakedness of the density curve of 
a continuous distribution near its centre (the interpretation for the 
discrete case in terms of the probability diagram being similar), This 
property is called kurtosis, and the coefficient of kurtosis В. is defined 
by 


=“ 
Bam 04 (7.6.1) 
For the normal distribution и, = 30* (cf. Ех. 1 Sec. 7.3) or B, =3. 


Now it is customary to measure kurtosis of any distribution as 
compared to the normal distribution which is treated as an ideal in 
the field of applications, and we call the quantity 

expla | 

Уз = В.-3 ot 3 (7.6.2) 
the coefficient of excess of kurtosis or simply the coefficient of excess 
of the distribution. Thus a density curve with positive excess will 
have a more sharp peak than the normal density curve, and the 
opposite for negative excess, Fig. 17 shows two symmetrical standar- 
dised (i.e. m=0, в=1) density curves having negative and positive 
excesses together with the standard normal density curve. 


Remark. It cannot, however, be mathematically proved that fie 
really gives a measure of peakedness of the density curve. On the 


44] MOMENT GENERATING FUNCTION 139 


contrary, we can construct examples of density curves having very 
sharp peaks but low values of В. or quite flat tops with high values of 
Ba. But it must be remarked that such examples are rare, and the 


(b) 


о 
Fig. 17. (a) Negative Excess (b) Positive Excess (c) Normal Curve 


Above statements hold good for the majority of the distributions met 
in practice. 


Example. GAMMA DISTRIBUTION. By (7.3.4) we have using the 
results of Ex. 2 Sec. 7.3 
us = ЦІ + 10 2)0 + 3) - 410+ 1)0+ 2)1 + 61(1 + 1)1° — 314 
-3K( +2) 


Bo = 3M + 2)/1°%=6/1+3, у = 6/1 


77 MOMENT GENERATING FUNCTION 


The moment generating function of a random variable X is a 
function of a real variable t denote by (0) or y(t) and defined by 


10) =~z(e* ) (7.7.1) 


140 MATHEMATICAL EXPECTATIONS I [ 7.8- 


Now, as to the question of convergence of the sum or intergal 
representing (4), we face some difficulties. For т=0, u(t) certainly 
exists, and (0) =1 ; but for non-zero values of і the series or integral 
concerned does not converge for all distributions. 


Differentiating the series representation of (t) for the discrete case 
term by term or its integral representation for the continuous case 
within the sign of integration k times at t — 0, we get 

(0) = E(X”) = ax (7.7.2) 
assuming, however, the validity of such a process. Now a necessary 
condition for defining the derivatives of y(t) at t=0 is that (z) should 
exist in a small neighbourhood of t=0. It can be proved that this. 


simple condition is also sufficient for the existence of moments of all 
orders and holding of (7.7.2) 


If follows from (7.7.2) that the power series development of y(t) 
will be 1 


vO= >ре (7.7.3) 
к=0 


Thus if (r) for any distribution is expanded by algebraic or other 
methods in a power series, the coefficient of 1" will be a;/k ! and hence 
the name moment generating function. 


On account of the above-mentioned convergence difficulties, the 
moment generating function is rather getting out of use and is con- 
veniently replaced by what is know ast he characteristic function which. 


exists for all distributions. 
7.8. CHARACTERISTIC FUNCTION 
Replacing t by it (i= /-1) in (0, we obtain the characteristic 
function of X to be denoted by x, (f) or x(t), i.e. 
X() = E(e'¥) = Efcos(t X) + i sin(tx)} (7.8.1) 


The characteristic function is thus a complex-valued function of a real 


variable f. Since |еЧ®| =1, it may be easily seen that x(f) exists for 
all values of t and for all distributions, 


7.8] CHARACTERISTIC FUNCTION 141 


The existence of the characteristic function does not, however, 
guarantee the existence of the moments of the distribution. This we 
can well guess from the fact that the characteristic function exists for 
all distributions, but all distributions do not possess moments of 
all orders. But if the moment o; exists, it can be proved that we are 


permitted to differentiate E(e’*) k times at t — 0 giving 
x(0) = ал (7.8.2) 


Hence developing x(t) in a power series of it, we may write formally 
(without any regard to its convergence) 


E 
х= 25 аз (иу (7.8.3) 
k-0 
so that the coefficient of (i/)* in the expansion of Х(0) in powers of it 
is axlk !. 

The most important property of the characteristic function is yet 
to be stated. We note that, given any distribution function F(x) the 
corresponding characteristic function x(/) is uniquely determined by 
the definition (7.8.1). Now the fundamental theorem concerning 
Characteristic functions states that the converse is also true, ie. the 
characteristic function x(t) also uniquely determines the distribution 
function F(x). The proof of this theorem will, however, be too diflicult 
for us and will be omitted. But we shall make many important 
applications of this theorem in the following form. 1f, by means of 
some indirect method, we find the characteristic function of an 
unknown distribution and if this coincides with the characteristic 
function of a known distribution, then we can at once identify the 
unknown distribution with the known distribution. 

The characteristic function of a continuous g(X) of X may be 
obtained as follows. Setting Y — (X) 


x,(t) = E(e#¥) = E(eitn 90 } (7.8.4) 
For a linear function aX + b 
Ха) = е x. (at) (7.8.5) 


and, in particular, for the standardised variate X* = (X —m)/c 
Xar (6) е" x (2/0) (7.8.6) 


142 MATHEMATICAL EXPECTATIONS I [7.9 


Examples 
1. BINOMIAL DISTRIBUTION 


x)= See (i) ra- mS Joea - y 


MN [ where q=1- p 


2, POISSON DISTRIBUTION 


© І ым 
= AN еїїү: it 
х0) = > ече э етв ye =e guo 
pen] 20 


= ent -1) 


3. NORMAL DISTRIBUTION 


со 
x= = Lo fee е-(ш-т)з 202 dx 


„2л a 
E 
=ейи-{0{1, X [ect meme dx 
= eimt~ho2t2 


4. GAMMA DISTRIBUTION 
1 oo 
x(t) = FO | ett о-= xii dc- kt Je хі-1 dx 
0 


со 


Sey кта 
-azxy 9] * x= dx 
0 
-(- it 


7.9 SEMI-INVARIANTS OR CUMULANTS 


If log x(t) is developed in a power series of it in the form 


log x(1) = 28 Kk (шук (7.9.1) 


7.9 ] SEMI-INVARIANTS OR CUMULANTS i 143 


the coefficient ку is called the semi-invariant or cumula t of order k 
of the random variable X. We have 


os > kl (ity 


and hence 


log x(/) -log {1 B5: үз ral 00: 


Expanding this іп a formal manner, we write 


log x(f)- >a k (jgyc— Жа E (28 a" EN 


or 


log X(t) ^ asCit) + (aa 7 a1" yO + (as — Зазаа +2а1° 9 


Gy de 


+ (a4 — Зао? – 4азаз t 12а; ° faa – 6a*) 4 


Comparing this expansion with (7.9.1), we obtain к following 
expressions of the x;'s in terms of the a;'s : 


ку=т 
Kg7 m? 
Kg 7 ag — Зат + 2m? (7.9.2) 


ka = a4 — Заз? — day + 12a4m* — 6m* 


Using (7.3.4) the «s are given in terms of the us by 
Kg 7 03, kg = tss ка = Ша — 30% etc. (7.9.3) 
Hence å 
теку o= „кез ya = ка/ка°!#» Уа = kalka" (7.9.4) 
To find the semi-invariants of a linear function aX4- b, we have 


from (7. 8.5) 


log Хаг) = ibt +106 x, (at) e ibt + 5 ay 1 (i 


k=1 


144 MATHEMATICAL EXPECTATIONS I [7.9 


Therefore, the first semi-invariant of aX + b is ax, +b, and that of 
order k(>1) is a* xz. 


Examples 
1. BINOMIAL DISTRIBUTION 
(in? шу 
log x(t) =n log (ре®+) =n log fi +pit+p*, ГЕР 
it)’ 
=np(it) + npq E D r tnpaa- p) e +прд(1- 6pq) " Wess 
So 
xı 7D, ка=прф, «s = npd(q Р), кг =пра(1 — 6pq) 
By (7.9.4) 
A m=np, o= Jnpq 
D .l-2p, , _1-6pq 
tU mpg Мард "57 npa 


2. POISSON DISTRIBUTION 


(ү 
log x(r) =й 2, DM 
ear 
giving к. = и for all k. Hence 


т=п, o= Ju, y,=1/ Ми, yam Ши 
3. NORMAL DISTRIBUTION 


ity 
log x(t) = imt – 3e? t? =m(it) + с? шу 


So 

ку=т, ка=ов?, кз=к+=...... =0 
which show that m is the mean, c? the variance of the normal distribu- 
tion and the coefficients of skewness and excess are both zero. 


4. GAMMA DISTRIBUTION 
log \(t) = -1 log (1 -й=1 C 
k=1 


Hence «; — (k 1) ! 1 and 
т=1, o= All, y17 2l JL ys = 6/1 


7.10] MEDIAN 145 


We shall now define some other useful characteristics which are 
not defined as mathematical expectations. 


7.10 MEDIAN 


This is another important measure of location denoting the point 
which divides the probability mass distribution into two halves. 
Mathematically, we define the median и by the equation 


F(u)-à (7.10.1) 


The median is thus the x-co-ordinate of the point of intersection 
of the distribution curve у= Ех) with the straight line у=+. Fora 
continuous distribution, since F(x) is continuous and strictly monoto- 
nic increasing, a unique median exists. But for the discrete case either 
of the two troubles arises : (i) the straight line y=} does not intersect 
the curve y= F(x) at all, but passes between two horizontal parts of 
this step curve or (ii) у= + intersects y= F(x) at all points for which 
x lies in an interval Хк LX < хь, Where x; and Xp}, are two con- 
Secutive points of the spectrum, so that every point of this interval 
satisfies equation (7.10.1) and may claim to be a median. 


To cover case (i), we extend our definition of the median as follows: 
The median y is a point which simultaneously satisfies the inequalities 


T(u-0)z3, Fu)>t (7.10.2) 

The inequalities (7.10.2) have an interesting geometrical meaning, 

viz. the vertical steps of the distribution curve are also regarded as 

parts ofthe curve while considering its intersection with the straight 
line у=}. 

To cover case (ii), we make the convention oftaking the middle 


point of the interval, 1(x; + хь.) as the proper median of the distri- 
bution. 


With these extensions, the median exists for every distribution and 
is unique. 


An important property of the median is contained in the following 
minimum theorem: The first absolute moment about any point js 
minimum when taken about the median. 


10 


146 MATHEMATICAL EXPECTATIONS I [ 7.10 


Proof. Let us prove the theorem for the continuous case, the proof 
for the discrete case being similar. Ifc и 


Й 


EUX-el)= | e- 3922s | - eod 


= [6-309 dx+ | e-d ax 


Coo p 


+f (х- с) f(x) dx -Í (х= с) f(x) dx 


со 


= | (а= x) f(x) dx + | Œ- u) f(x) dx 


tes Di {лә &-| fo) dx } + T (c х) f(x) dx 


= E(IX - ul + (c7 1) Е) 14 2 |е) for) ax 
Using (7.10.1) 1 


Е(Х-с)=Е(@Х-ш)+2 | (с-л) Ло) dx 


Г" 


Since c > и, the integral on ће R.H.S. is > 0, and hence 
Е(|Х-с|)> E(IX - ul) 
Similarly, for c < u we have 


и 


щ\х-)=Е(Х-ш)+2 | (к-да) dx 


and the above inequality holds. 
Hence E(| Х-с|) > E(| Х- ul) always which proves the theorem. 


1.11] MODE 147 


Now К(|Х-и|) obviously gives a measure of dispersion about 
the median и and is our natural choice аз a dispersion characteristic 
when the median is selected as the characteristic of location. For a 
symmetrical distribution, the median lies at the point of symmetry 
and coincides with the mean if the latter exists. 


Examples 

1. Consider a discrete distribution, the spectrum of which consists 
of the points 0, 1, 2,...... n having the same probability mass 1/(n + 1) 
at each point. If iseven, there is no point which satisfies (7.10.1), 
but, by the extended definition (7.10.2), the spectrum point łn is the 
median. If is odd, all points of the interval 1(n— 1) < x < 3(n 4 1) 
satisfy (7.10.1) so that, according to our convention, we take the 
middle point of this interval as the median, i.e. и= in. 


2. NORMAL DISTRIBUTION. It is symmetrical about the mean 
m, and hence и = т. 

3. CaucHv DISTRIBUTION. This distribution is also symmetri- 
cal about the point и which shows that и, as the notation implies, is 
the median. We remember that for the Cauchy distribution the mean 
does not exist, but the median does as it must. 


7.11 MODE 

Continuous ease. Any point for which the density function f(x) 
hasa maximum is called a mode of the distribution. Now f(x) may 
have one, two or many points of maximum, and accordingly the 
distribution is called unimodal, bimodal or multimodal respectively. 


Discrete case. A point of the spectrum having the relatively 
tallest ordinate in the probability diagram will be called a mode, 16. 
хь is a mode if 


Л > Sena» ел (7.11.1) 

Clearly, there may be more than one mode for a discrete distribu- 
tion as well. 

The mode is sometimes useful as a measure of location, particularly 

for unimodal distributions. For a unimodal symmetric distribution 

the mean, if it exists, and the mode are identical. Hence, for any 


148 MATHEMATICAL EXPECTATIONS I [7.11 


unimodal distribution having mode M, the quantity m — M depends 
on the degree of asymmetry of the distribution, and we define another 
useful measure of skewness to be 
m=M (7.11.2) 
с 
Examples 


1. САммА DISTRIBUTION. Here 
-$ Q4i-2 
fetu (1-9 


which vanishes for x-/—1 and 0 (if / 22). If may be easily seen 
that the maximum of f(x) corresponds to x -/—1. Hence the distri- 
bution is unimodal, and M =1-1. 

The measure of skewness (7.11.2) = 1/ JI, whereas the other measure 
y; =2/ Jl. In order to avoid such disagreements, some mathematicians 
take Фу, as a coefficient of skewness instead of yi. 

2. Fora binomial (2л, 4) variate Л= (7) ( 1)" which, we know, 


is maximum for i ^n, and hence M =n which is also the mean of the 
distribution. 


7.12 QUANTILES 


Let p (0<p < 1) be a given number. 


The quantile of order р, 
{› will be defined by 


Fp) =p (7.12.1) 

with extensions similar to the case of the median. Obviously (,;, = и. 
The quantiles {,,, and (,/, are much used їп practice and are 
called the lower and upper quartiles respectively. The quantity 


(ага = Сога) is called the semi-interquartile range or the quartile 
deviation which is often used as a measure of dispersion, 


The quantile of order k/10 is called a decile of order k, and the 
quantile of order &/100 a percentile of order К. Therefore, the 5th 
decile is the median, the 25th percentile the lower quartile etc. Given 


the deciles or better the percentiles, we can get a fairly good idea 
about the distribution of the probability masses. 


7.14] EXERCISES 149 


Example. Слиснү DISTRIBUTION. From (5.8.7) 


1 


F(x) ы = tan-* (5:4) P 3 


dico 
4'4 


2 
for х=и-2, u-- A respectively. Hence (1/,—4—2, („„=и+ 4 and 
the semi-interquartile range = 2. 


7.13 SOME REMARKS 


1. We have introduced three principal measures of location— 
mean, median and mode. Of these only the mean is defined as a 
mathematical expectation, as a result of which the rules of calculation 
of the mean are much simpler than those of the median or the mode. 
But one difficulty with the mean is that it does not always exist. 
Moreover, in certain distributions in which small masses occur at 
great distances away, the mean which denotes the centre of mass. is 
dragged away from the bulk of the distribution. In such cases also 
the mean is not suitable as a measure of location. 


2. If the standard deviation or the first absolute moment about 
the mean or the median is not available as a measure of dispersion, as, 
for example, in the Cauchy distribution, we may conveniently use the 
semi-interquartile range as a measure of dispersion. The latter also 
finds great use in statistical applications. 


3. Although the quantities y,; ya and the measure of skewness 
(7.11.2) are dimensionless, there do not exist exact theoretical limits 
for these characteristics, and this is certainly a disadvantage. In 
practice, however, these quantities are usually found to be small. 


7.14 EXERCISES 


1. If n balls are drawn (a) with replacements or (Б) without replacements 
from an urn containing №, white and N, black balls (п <N,+N,), find the 
expectation of the number of white balls in the cases (a) and (b). 


2. A point is chosen at random on a line segment AB of length 2a. Calculate 
the expected values of the rectangle AP. PB and the difference |AP - PB|. 


3. If X is uniformly distributed over (0, Ат), compute the expectation of the 
function sinX. Also find the distribution of sin.X, and show that the mean of this 
distribution is the same as the above expectation. 


150 MATHEMATICAL EXPECTATIONS I [7.14 


4, In Banach's match-box problem (Ex. 7 Sec. 5.10) find the expectation of 
the number of matches left in one of the boxes when the other box is just found 
empty. 


5. Show that the expectation of the number of failures preceding the first 


success in an infinite sequence of Bernoulli trials with probability of success p is 
(1-p)Ip. 


6. If X isa (I) variate, compute E( J X). 
7. Find the mean and variance of the rectangular distribution. 
8. The Pascal distribution is defined by 


xii (1=0,1,2,..) 
and 


mci. (4) (420) 
Ua YR 
Find the mean and variance of this distribution. 


9. Given that the variate X is normal (0, 1), find the variance of eX, 


10. The first, second and third moments of a probability distribution about the 


point 2 are 1, 16, —40 respectively. Find the mean, variance and the third central 
moment. 


11. The probability density of a continuous distribution 
#Х(2-х) (0 <х < 2) Compute the mean, 
skewness y,. 


is given by : fox) 
variance and the coefficient of 


12. For the binomial (n, p) distribution, prove that 


Pass -p) (nk. +) 
and hence obtain y, and y,. 


18. Prove that E(X?)Z4E( хур. 


Deduce that the first absolute moment about 
the mean is at most equal to the stand: 


ard deviation. 
14. For the Poisson distribution with Parameter и, 


prove that 
Beim (krmit) 
Hence calculate y, and у,. 
15. Show that the first absolute moment about the mean for the normal (m, о) 
distribution is J(2/z) c. 


16. Calculate the kth moment (about the origin) for a В, (1, m) distribution, 
and hence obtain the variance, Show al 


so that, for /, m > 1, th i ique 
mode having the value @-1)/(1(+т-2). " "SIE 


1.14] EXERCISES 151 


17. A continuous distribution is given by 


I=; Iz e~Mlog 2)22 (х > 0) 


=0 (x < 0) 


which is called a log-normal distribution. For this distribution calculate the mean, 
mode and standard deviation, and obtain the coefficient of skewness (7.11.2) 


18. Show that the mode M of the Poisson distribution with mean и is the 
integer(s) determined by the inequalities: »-1<M<z. 

19. A continuous distribution has probability density f(x) ae-** (0 « x < о ; 
а > 0) Calculate the moment generating function, and hence obtain az. 


20. Prove that the moment generating function of a uniform distribution over 
the interval (—a, a) is sinh atjat. Hence calculate the central moments. 


21. Show that the characteristic function for the Pascal distribution defined 
in Ex. 8 is (1—4(e'* —1)]-. By expanding the characteristic function in powers of 
it find a,, ag, аз, а, and hence calculate the coefficient of skewness y, and the 
coefficient of excess y,» 


Also calculate the first four cumulants of the distribution and verify therefrom 
the values of y, and y,» 


22. Find the first four cumulants of the Laplace distribution defined by 
fe e-lz-h/^ (—co<x<co ;A»0) 
and hence find the values of m, о, у, and y,- 
23. Find the mean, median and the mode of a binomial (4, 3) variate. 
24. Find the median for the Poisson distribution having mean 2. 


25. Find the lower and upper quartiles for the distribution of the number of 
points on a card drawn at random from a full pack, and calculate the semi-inter- 
quartile range. (Take 11 points for the jack, 12 for the queen and 13 for the king.) 


26. Calculate the first absolute moment about the mean and the semi-inter- 
quartile range for the Laplace distribution defined in Ex. 22. 


CHAPTER 8 
MATHEMATICAL EXPECTATIONS II 
A. TWO-DIMENSIONAL CASE 


8.1 EXPECTATION FOR A BIVARIATE DISTRIBUTION 


Consider the joint distribution of two random variables X and Y. 
The expectation or the mean value of a continuous function g(X, Y) 
of X, Y is defined by 


E\g(X, Y); = Б 22 B(x Vi) fi for the discrete case 


j=- і= оо 


(8.1.1) 


оо 
- | 
со 


provided the series or integral is absolutely convergent, 


[se Y) I, у) dxdy for the continuous case 


Remark. Now the mean value E{g(x)} may be calculated in two 
ways, viz. (i) by formula (7.1.1) with respect to the distribution of x 
alone and (ii) by the above formula (8.1.1) with respect to the joint 
distribution of X with any other random variable Y, and it needs 
proving that these two values are the same so that our definitions are 
consistent. 


Proof. DISCRETE CASE. According to (8.1.1) 
Ejg(x)} “> > 8) =) а(х) DSi 


=> g(x) fx 8x) f. 
Which is the value given by (7.1.1). 


8.1] EXPECTATION FOR A BIVARIATE DISTRIBUTION 153 


Continuous Casr. Here 


со со 


вео = | | eG) fey) dxdy 
- ES dx [rts »dy- IE 4х) dx 


If follows from the above remark that the characteristics such as 
the mean, variance etc. of the random variables X and Y are uniquely 
defined whether they are calculated from their individual distributions 
or from their joint distribution. 


Geometrically, the point (Mme, my) represents the centre of mass 
of the two-dimensional probability mass distribution ; the variances 
Gx", сух represent respectively the moments of intertia of the mass 
distribution about the lines x =m, and y =m, which are parallel to the 
axes passing through the centre of mass and are thus measures of 
dispersion about the said lines. 

An obvious but important property of expectations is 

Eig,(X, Y) - 8&(X, Y)+...+ 8(X, У) 

= Eigi(X, Y)}+ Efg.(X, Y) +... + Е{в„(Х, Y)y (8.1.2) 
provided all the expectations on the R.H.S exist (which implies the 
existence of the L.H.S.). 


In particular, we have 
E(X + Y) - E(X)  E(Y) (8.1.3) 


(8.1.3) gives the addition rule for mean values which states that if the 
mean values of X and Y exist, then the mean value of their sum X+ Y 
also exists and is equal to the sum of their mean values. 


Examples 


1. If X, Y are random variables defined in Ex. 1 Sec. 6.2, compute 
E(|X- Ү]). 


154 MATHEMATICAL EXPECTATIONS II [ 8.1 


Неге (xi, »)-G, j) (i=0, 1,2; j=0, 1, 2, 3) and f= 1/9 
for all i, j except that fıs =f33 =f23=0. Hence 


3 2 
віх-ү)- > i-is 
j=0 


4=0 


1010-01 40-1] 10-2] 4 ]0-3| 4 [1-0] + 


+ |1-1|4 [1-2] + |2-0] + 12-1) 
=11/9 


2. If X,Y are independent standard normal variates, find the 
mean value of the greater of | Х| and | Y|. 


1 =i 2 
Дх, у EN SEEI (- =<х< ә, -ә<у< е) 


Етах (| Х|, | ¥|)} 


| | | max (11, ТУ До y) ахау 


BEI 


ЕІ (02 ysya 1 - (r3 yay 

m | |x] е d dxdy+z | f Грет ахду 
\х| > |у| ГУ] 

- 1 | [хет quay [ from symmetry 
Га > |y] 
2 env? д -22/8 

== у xe ах [ from symmetry 

7e |у| 

a _2 

ee - 


3.3] COVARIANCE, CORRELATION COEFFICIENT 155 
3.2 MOMENTS 
We define the moments (about the origin) of the joint distribution of 
X and Y by à 

akl = E(x*y’) (8.2.1) 


where К, l are non-negative integers ; aj, is called a moment of order 
k+l. We have 


ако=ах 001 = Ayl 
In particular 
aoo *l, азо =Mzy dor = Ту 
The central moments are given by 
uy, = Е|(Х-т„)* (Y - mj) (8.2.2) 
Hence С 
Hko = Urk loi” Шу 


Шоо = 1, по =0, uos = 0, изо = 02°, Поз = су? 
3.3 COVARIANCE, CORRELATION COEFFICIENT 


The second order mixed central moment u,, furnishes an important 
measure of, what we may roughly say, the jointness of the bivariate 
distribution and is called the covariance of X and Y, to be denoted by 
соу(Х, У), i.e. 


cov(X, Y) = ui, = Е\(Х-т„)(У - т) (8.3.1) 


Now if we want to find a dimensionless measure of the property 
expressed by the covariance, we have to introduce, following our 
usual practice, the standardised random variables X*, y* in places of 
X, Y respectively and take cov(X*, Y*) as the required measure. We 
shall call cov(.X *, Y*) the correlation coefficient of X and Y and 
denote it by p(X, Y) or pzy or simply p, i.e. 


p(X, Y) = cov(X*, Y*) = E(X*Y*) 
-EIX m3(Q -т„)} 
Ox0y 


s _ соу(Х, Y) 


OzGy 


(ozs c, 70) (8.3.2) 


156 MATHEMATICAL EXPECTATIONS II [8.8 


1. Ifa, (40), aa (40), bi, ba are constants, then 


pla, X b,, a, Y + b= [2] Р(Х, Y) (8.3.3) 


Proof. (a,X+b,)*= X*, (asY  b,)* = , 2. y* 


las] 


a, 
Jail 


So 


ауаз _ 
[а 1а| 


pla, X b, а,у+Ь,)-Е ( x*v*) 


we 2142 *y*)- 
CARA E(X*Y*)=R.H.S. of (8.3.3) 
Hence if a4, а. > 0 
p(as X by, as Y + bo) p(X, Y) (8.3.4) 
This shows that the correlation coefficient is independent of the choice 
of origins and units of measurements of the random variables, In 
particular, p(X*, Y*) = р(Х, Y), since cz, oy > 0. 
2. -1< (X,Y) <1 (8.3.5) 
Proof. 0 (X*+Y*)*=X*°+y*242K*y* 
Considering expectations and remembering that E(X**) = E(y**) = 1, 
we get 
0 < E((X* x Y*)*}=2{1 + p(X, ү) 
which gives (8.3.5). 
If p(X, Y) = +1, we must have X** Y*-0 ог Y*- 4 X* or 
Y-m,., ,X-m. 
Ее (8.3.6) 
Thus if p(X, Ү)= +1, Y is a linear function of X given by (8.3.6), or in 
other words, the whole probability mass of the bivariate distribution 
is situated on the straight line 


Komy y X= Mg 


Oy Og 


Conversely, if Y=aX+b, a, b being constants, p(X, Y)= +1, for 
9 


p(X, aX b)- jap X)- +1 


8.3 J COVARIANCE, CORRELATION COEFFICIENT 157 


3. If ;(X, Y)=0, we cannot, however, conclude that X and Y 
are independent. In that case we shall simply say that X and Y are 
uncorrelated, The detailed discussion regarding this will be taken up 
in Sec. 8.7. 


4. Let us calculate the variance of Х+ Y. We have E(X+ Y) 
= Mg +My and 
Ix Y- (ms + т„)}® ={(X - т.) + (Ү-т„)}* 
-(X-mj*4(Y-mj)?x2(x-m(Y-m;) 
"Taking expectations of both sides, we get the variance formula : 


var(X + Y) =var(X) + var(Y) +2 cov(X, Y) (8.3.7) 
which, in another form, is 
c*(X + Y) = o*(X) +0°(¥) +20(Х)о(у)ь(Х, Y) (8.3.8) 
If X and Y are uncorrelated 
c*(X + Y) c*(X)  e*(Y) (8.3.9) 
b. H337 044 = т.т (8.3.10) 


Proof. (X-mj(Y-mj)- XY -mX -mY + mmy, Hence 
п 7 E(XY) - m,E(X) - m,E(Y) + mmy, = ауу — т.т 


We shall now prove an interesting theorem. 


Theorem. If, for any pair of correlated random variables X and 
Y, we make a linear transformation (X, Y) — (U, V) given by a rota- 
tion of the axes through a constant angle о, i.e. 
О= Х соѕа+ У ѕіп о, V=- X sin a+ Y cosa (8.3.11) 
then U and V will be uncorrelated if a is given by 
tan 2a =—2Рбгсу. (8.3.12) 
Oz — Oy 
where p=p(X. Y). 
Proof. tiy=mz, COS a +m; sin a, т„= — т, sin a+ My COS а 


So 
(U-m)(V т.) ={(X- m=) cos a + (Y т) sin а} 
x {- (X - mz) sin a+ (Y —my) cos а} 
= — К-т.) - (Y - т) sin 2a (X -mz)(Y ^ my) cos 2a 


158 MATHEMATICAL EXPECTATIONS II [8.5 


Непсе 
cov(U, V) = —i(cz? — су?) sin 2a + рогсу COS 2a 
which vanishes if (8.3.12) holds. Hence the theorem. 


Examples 


1. In Ex. 1 Sec. 6.2 calculate т,, my, oz?, cy? and p. 


E] 2 3 2 
M: = > > з= аз = > BR f= 


j=0 i=0 j=0 i-U 
so that c,2=%2, These may also be calculated from the marginal 
distribution of X. Similarly 


з з 
о 2 a. | 7i 
Ту= А), ay, 7 9, oy? = 82, a1 = > >, Ufa-$ 
j=0 i=0 


Hence by (8.3.10) 
йуз = — 25/81, p= — /10/8 
2. BIVARIATE NORMAL DISTRIBUTION. Since the (marginal) 
distribution of X and Y are normal (т,, сг) and (my, cy) respectively, 


the parameters mz, My, oz, cy have their natural significances, and 
we have 


1 
cov (X, es UI 
| | G@-mav-m) e 


—oo 


1 [eno opm) (y—my) Um)? 


ES "st "n | dxdy 


1 
9 
а 
8—8 


ос 
f xy e7(z3—3pzy- y? )/9(1- p2) dxdy 
Zoo 


or 


nor Pu í FP 
p(X У)= >= f> Caig D rA f хе-(«-ю%®й epa) dx} 


T е О 


Thus the parameter p denotes the correlation coefficient of X and Y. 


8.5] SOME EXTENSIONS TO n-DIMENSIONS 159 


8.4 CHARACTERISTIC FUNCTION 


The characteristic function x(t, и) of the joint distribution of X, Y 
is defined by 


х@, u) = Efe itX+uY)} (8.4.1) 


> ө Б. 
We note х(ї, 0) = x.(t), x(0, и) =x,(u), arduo" i*a,, etc. There 
fore, the development of x(t, u) in powers of it and iu will be 
X(t, u) =14 (aziit + ayiti) + E lazs(it)? + 2a (if) (iu) 
+ ays(Iu)? | +. (8.4.2) 


8.5 SOME EXTENSIONS TO n-DIMENSIONS 

For any set of n random variables Ж.а ыа X, having means 
Mie Mayan m, and standard deviations c,, в»,...... c, respectively, 
the mean M, and standard deviation Xn of their sum 


Sn= Xi Xa Xs (8.5.1) 
are obtained by formal generalisations of (8.1.3) and (8.3.8) which 
give 


М, = ту + mae +My, (8.5.2) 
n 
Bat У) o2 D oio; (хь X) (8.5.3) 
i=1 i<j 


(i, j being any combination of 1,2,...... n taken 2 at a time in the 
second summation.) 


If X,, X. E. кай Xn are pairwise uncorrelated, we get the simple 
variance formula 


Breas o, nee tag! (8.5.4) 
More generally, for a linear combination 
X-a,Xita Xs ie cx Ts Xs (8.5.5) 


we have using (7.4.5) 
My = ат + AgiNg ++ +а„т„ (8.5.6) 


160 MATHEMATICAL EXPECTATIONS II [8.5 


^" 
oz? = 2 аёо + Nc аа; вс; p(Xi, Xj) (8.5.7) 
isi 


i<j 
If p(X; X;)-0 (ij) the last equation reduces to 


- ga m 42042 ag gaa e + anfon? (8.5.8) 


Example. DRAWINGS WITHOUT REPLACEMENT. If n balls are 
drawn successively without replacements from an urn containing 
N, white and №, black balls (n< N,-- М»), find the mean and 
variance of the number of white balls. 

Instead of calculating the mean and variance directly from the 
distribution of the number of white balls, let us here follow an 
indirect method which will illustrate the use of the above formule. 
We define n random variables X,, X.,...X, on the event space of 
n drawings by; Х;=0 ог lcorresponding to black or white ball 
in the ith trial so that S,, denotes the number of white balls. 

From the symmetry of the situation, while considering the ith 
drawing we may forget about the rest of the drawings, and as such 

Р(Х; = 1) = М.П, + Na) 
So 
m= E(X)) = М,/(М, *N3, E(X?)=N3/(Ni+ Ne) 
Hence by (7.4.3) 


By (8.5.2) 


Now consider the distribution of the two-dimensional variate 
(Xi Xj (7j); its spectrum consists of 
(1, 0), (0, 1), (1, 1), 

SD no. ^T 
(У, ENS(N EN -1y Then 


^ the four points (0, 0), 
the probability mass at the last point being 


N (N1 -1) 
(V; NN, +N, —1) 


Et) = 


8.6] MULTIPLICATION RULE FOR EXPECTATIONS 161 


and by (8.3.10) 


cov (Xa X) = јс) 
From (8.5.3) е 
ee >. (2) N.N, 
* (NEN) 2/(М, +М»)*(М, +, Т) 
or 
х ali- xx (8.5.9) 


Remark. In the case of drawings with replacement, the number of 
white balls has, we know, a binomial distribution with parameters 
in, N/(N,+N,)}. Hence 


LAN, А _nN N, 
MaN EN, SS ү уув 


so that the mean remains the same but the variance is different. 


The joint characteristic function of AX, Xo;......Xn Will be given by 
Хб, tas.. bn) Ее) (8.5.10) 


B. INDEPENDENT RANDOM VARIABLES 
8.6. MULTIPLICATION RULE FOR EXPECTATIONS 


Theorem. If X and Ү are independent random variables and 
8,(X) and g.(Y) are continuous functions of X and Y respectively 
whose expectations exist, then 


Ев.) (У) = Elg (X) Е{в„(Ү)} (8.6.1) 
Proof. DISCRETE CASE 


HR Su Else > nos 
D i 


If E\g,(X)} and Efg.(¥)} exist, the series representing these are 
absolutely convergent, and therefore 


{> або» }{ >; sa Mus} = ke 81 (х) (уз 


i 


and the seriés on the R.H.S. is also absolutely convergent. 


11 


162 MATHEMATICAL EXPECTATIONS II [8.6 


Since X and Y are independent, f;; fj; —f;; for all i, ј. So 


Big (Xt Eig (V) >, >, 8 (два efus 


- z Pi 810х)8:(79/: 


= Efg,(X)g.(¥)} 
This shows that Ejg,(X)go(¥)} exists, and formula (8.6.1) holds. 


Continuous Case. The proof is similar to the discrete case. 
We have 


ec ос 


гла (0) | ad, 18У) | 2.027020» 


-co -оо 


the intergrals being absolutely convegent. Hence 


ў воло] КОШТУ) “| | 81(¥)8200) fal) fy)dxdy 


and the double integral is absolutely convergent. 


For independence of x and Y, we have RO) =x, y) 
Inserting this in the above equation, we get the theorem. 


A particular form of (8.6.1) is 


E(XY) = E(X) E(Y) (8.6.2) 
which is called the multiplication rule for mean values. Stated 
completely, the multiplication rule says that if X and Y are independent 
random variables having existent means, then the mean of their 
product XY exists and is equal to the product of their means. 


This theorem may be easily generalised to more than two variates. 


For three variates X, Y, Z, if (X, Y) and Z are independent, we shall 
have 


Eig.(X, Y)g«(Z)) - Eig. Qr, Y) Eie (Z)i (8.6.3) 
provided the expectations on the R. H. S. exist. ` 


8.7] MOMENTS 163 


If X, Y, Z are mutually independent, it follows from Theorem II 
Sec. 6.7 that (X, Y) and Z are independent so that (8.6.3) holds, 
and further 

Eig; (X)2«(Y)25(2)) = Eig: )) Е{в„(Ү)} Eigs(Z)i (8.6.4) 

Generalising to the n-variate case we have the following result : 
ПЕ. X, are mutually independent, then by Theorem III (b) 
Sec. 6.7 the random variables 

(X55. X1; (Xii i1 Xs. (Xenti Ха) 
where 1<k,<ka< = < k,,«n are also mutually independent, and 
hence 


E\g(X1,... X8 )8 (Ха... Xi). Ema Xs is Xn) 


= Ejg (Xas... Xk) Ejga (Xk tire Kha) bes Eia Xeno An) 
(8.6.5) 


where g’s denote continuous functions of their arguments, provided 
all the expectations on the R.H.S. exist. 


In particular, we have the simple formula 
Eig i(X2)8 (X2)... (Ха) = Elgs X.) Eig (X2)1...Etgs(5)) (8.6.6) 


8.7 MOMENTS 
If X, Y are independent, it follows from (8.6.1) that 
akı = Е(Х") EY’) =ахк ау (8.7. 1) 
Similarly 
PAL Шак д (8.7.2) 
Hence u,,— fer uy; 70 or p(X, Y)-0. 
Thus if Y, Y are independent, they are necessarily uncorrelated. 
But the converse of this is not true; the random variables may be 


dependent, but their correlation coefficient may vanish due to 
symmetry of the distribution. The following example is to the point. 


Example. Let X have any distribution symmetrical about the 
origin. Then m,-E(X)-0, E(xX?)-0. Setting у= X^ 
cov (X, Y) - E(jX(Y - m,)} - EÍX(X? ^ mj) - 0 
ie. X, Y are uncorrelated, although X, Y are even functionally 
dependent. 


164 MATHEMATICAL EXPECTATIONS II [8.8 


Remark. If X,, X;,...... X, are mutually independent, they are 
certainly pairwise independent (cf. Theorem III (a) Sec. 6.7) so that 
they are pairwise uncorrelated, and hence the simplified variance 
formule (8.5.4) and (8.5.8) hold for these variates. 


8.8 CHARACTERISTIC FUNCTION 
For independent X, Y 
X(t, и) = E(ei^X eiu Y= Е(ейХ ) Е(еїч Y) 
or 
X(t, и) = x. (xu) (8.8.1) 


Conversely, if (8.8.1) holds, then X; Y can be proved to be independent. 
The proof of the converse is, however, beyond the level of this book. 
Taking this for granted, we have, generalising for п variates, the 
following important theorem. 


Theorem. A necessary and sufficient condition for the random 
variables X,, Xq)...... X, to be mutually independent is that their 
joint characteristic function is given by 

X(t Га) 7 Xs (05) Xu(te)...... Xn (tn) (8.8.2) 
where x,(t,), Xo(t,),......Xx(¢,) respectively denote the characteristic 
functions of X,, X,,...... » 


We may now suggest a simple proof of Theorem Пс) Sec. 6.7. 


Setting У, =2,(Х,,...Хь.), Ya = 2:(Хъ.+1,... Жк)» Үһ+ = 
EndiXns ii. Xn), We have 


Xyi, Vareda Ea ts, Ima) 
= Ej eil Yit Yi bes Yun) | 
= Ej eififigit gs... gii Bug } 
= E (et:8:) E(ei':8:) M us Ее!) [ by (8.6.5) 
АЗ Efe: Y) Ее Y) S Е(ейһ+ Кз) 
= Xy (0) Ху, (ta)... Xy, mai) 


Hence, by the above theorem, Y,, Yo,......... Үһ+ are mutually 
independent, 


8.8] CHARACTERISTIC FUNCTION 165 


Sum of independent random variables. Let Xi, Xo,...... X, be 
mutually independent random variables having characteristic func- 
tions x,(t), Ха(?),......... xn(‘) respectively. Then the characteristic 
function K(r) of their sum S, is given by 

К(ї) = E(eiS») = E estos +Х ++) 
= E(gi X eX. git X.) 
= E(ei! X:) Е(ейХ ).....Е(ейХ:) [by (8.6.5) 
or 
K(t) 7 xy) x«(t)......... x«(t) (8.8.3) 


(8.8.3) states an important property of characteristic functions, viz. the 
characteristic function of a sum of mutually independent random 
variables is the product of their individual characteristic functions. 


For a linear combination X-a,Xita,X, a Xs, we get 
using (7.8.5) 


X«(t) = x. (ai1) х„(а) X«(a,.t) (8.8.4) 


RzPRODUCTIVE PROPERTIES OF VARIOUS DISTRIBUTIONS 


The reproductive properties of different distributions may be very 
easily established by making use of the characteristic functions 
together with the fact that a characteristic function uniquely deter- 
mines the distribution. 


d. U xXx] Xn are mutually independent binomial variates 
having parameters (v,, p), (vs, DY PCR (va, p) respectively, then their 
sum Sa is also binomially distributed with parameters (v, p), 
where v- y, * ya + ee Бул 


Proof. The characteristic function of Xp хк(@) = (pet + q)”: 
(k=1, 2,...п). Hence by (8.8.3, K(t)=(pett+q)" which is indeed 
the characteristic function of the binomial (у, р) distribution. Hence 
we conclude that S, is binomial (у, p). 


GD ON OS Xn are mutually independent Poisson variates 
having parameters uis н,...... us respectively, then their sum S, is a 


Poisson-(u; + иа + *** + us) Variate. 


166 MATHEMATICAL EXPECTATIONS II [8.9 


Proof. Here x(t) =er:l"-1) (Ik — 1,2...) so that 
K(t) = eie –1) 
Hence the result. 

ЗЛИТЕ. Ха, Xn are mutually independent normal variates 
having means m,, т,,...... ma and standard deviations оу, оз›...... On 
respectively, then any linear combination X —a, X, as Xa + + an Xn 
is also normally distributed whose mean and standard deviation are 
given by 

та=а,ту+авт»+ е + llis 
92^ = 037547 + Ay? gs + Anon? 
Proof. x(t)=e 1719?" and so 
Xp (apt) = e mita uho e (k=1, 2,...п) 
By (8.8.4) 
X(t) =e im,t~ho,2t2 
*' 


which proves the theorem, 


Remark. The above result is slightly more general than the 
reproductive theorem for the normal distribution; the latter is 
obtained from the former by рио а, =а, = ...... a, — 1. 


4. If XQ X25......Xn are mutually independent gamma variates 
having parameters l, [„,...... la respectively, their sum is а 
V1 4 la+- +1) variate. 


Proof. x(t)=(1- ir) ^ (k—1, 2,... п). Rest is obvious. 
8.9 ANOTHER DISCUSSION ON BERNOULLI TRIALS 


As in the example of Sec. 8.5, we define n random variables Xi, 
EX een X, on the event space $„ of a sequence of n Bernoulli 
trials as follows : X; takes the value—0 or 1 corresponding to event 
points for which the ith trial results in a failure or success respectively 
(i=1, 2,...n). Hence 


P(X;=0)=4, Р(Х;= 1) =р 
ie. Ж; is binomial (1, p). 


89] ANOTHER DISCUSSION ON BERNOULLI TRIALS ` 167 


The spectrum of the n-dimensional variate (X,, X2)...Xn) then 
consists of the 2" points 
(EE PESEE h) (5, fes.....-in =0, 1) 
each of which corresponds to an event point of S,. The definition of 
independence of the п trials gives (cf. Sec. 4.3) 
Р(Х, = iy, Xs =ie,...Xn= is) = Р(Х, gá iy) Р(Х» =ig)...P(Xn in) 
which shows that Ж, X,...... X, are mutually independent. 


Now since X;, Xs,...... Xn are mutually independent variates, each 
binomial (1, p), it follows from the reproductive property of the 
binomial distribution that their sum Sn =X; +X tees t Xa, which 
represents the number of successes in п trials, has a binomial (п, p) 
distribution. 


Example. RANDOM WALK PROBLEM. This is an interesting 
model connected with a Bernoullian sequence of trials. Let a particle 
be initially at a point r, a given positive integer, on the x-axis. Now 
a sequence of n Bernoulli trials are performed, and for each trial the 
particle moves by jump through unit distance either in the forward 
or the backward direction, according as the result of the trial is a 
success or a failure. Our problem will be to find the probability 
distribution of the co-ordinate of the particle after п trials or jumps. 

Let the random variable X;' denote the displacement of the particle 
at the ith jump (i=1, 2,...7). Then X; takes the two values — 1 and 1 
with probabilities q and p respectively. We note that X;'does not 
exactly have the two-point binomial distribution, but the variate 
X; - (X; + 1)/2 whose spectrum consists of the points 0 and 1 is indeed 
binomial (1, p). After n jumps the final co-ordinate of the particle is 

Xr Xi Xi Fx Xp! = Wy tr —N 
where $4 = 2^ X;. 

From the conditions of the question, X;"s or X;'sare mutually 
independent, and hence S, is binomial (n, p). Therefore, the spectrum 
of X' is given by (cf. Ex. 4 Sec 5.9) 


x/-2i-r-n (i 0, 1, 2,...п) 
| (8.9.1) 


P(X'=x;/)=P(Sn =i) = ( п | pig 


168 MATHEMATICAL EXPECTATIONS II [8.10 


C. CONDITIONAL EXPECTATIONS AND REGRESSION 
8.10 CONDITIONAL EXPECTATION 


Diserete case. The conditional expectation or mean value of a 
continuous function g(X, Y) of X and Y on the hypothesis Y — y; is 


defined by 
> 8% y) fi 


BO WIY= y= D eeo yim ——;— — (8100) 


its existence being understood in the usual sense. 


We note that E{g(x, Y)|Y — ys} is nothing but the expectation of 
the function g(x, Jj) of X with respect to the conditional distribution 
of X on the hypothesis y= Vie 


The conditional mean of X or the mean of the conditional distribu- 
tion of X on the hypothesis y= y; is defined by 


> he 


ту E(X Y = y))e —— — (8.10.2) 


E 
Which obviously represents the centre of mass of the probability mass 
points on the line у=у;. 


The conditional variance of X on the hypothesis Y — y; is likewise 
given by 
aĉaj = Var(X | Y = y;) = E(X- m, 9? |y yj) (8.10.3) 


The definitions of other characteristics of the conditional distribu- 


tion of X on the hypothesis y= yi may be easily constructed. The 


conditional expectation of 2(Х, Y) and the conditional mean, variance 
etc. of Y on the hypothesis x= х; 


are also defined in an exactly similar 
manner, 
If X and y are independent, we have 
fisher Ри; 


and hence 


FigQOlY = у} = Elg(x)i, Eih(Y)| X= x} = Еу) (8.10.4) 


8.11] REGRESSION CURVES 169 


If follows, in particular, that 
Maj = Mz, Ту; = My, Calg = Oz, Oyi = су (8.10.5) 
Continuous case. The conditional expectation of g(X, Y) on the 
hypothesis У = у is defined by 


со 


El у= | s6 61 ax 


=æ 


f a(x, у) f (x, y) dx 
AO 


The conditional mean of X on the hypothesis Ү = у is a function of 
y to be denoted by m,y or, more conveniently, by m;(y) and defined by 


ос 


[56.2 ах 
mAy)=E(X|Y=y)= aa <i (8.10.7) 


(8.10.6) 


Similarly, we define 


| x, у) dy 
тх) = E(Y| X= x) === — (8.10.8) 
The conditional variances ogy? or o;?(y) and opz? or o,?(x) are 
defined by 
os?’ (y) = var(X| Y = у) = EX- my) | Y =y] (8.10.9) 
o° (x) = var(Y | X= x) = E[fY - m)? |X ^ x1 (8.10.10) 
If X and Y are independent 
fx) = 50), LOIDA) 
which lead to 
т„(у) = т, тх) = my, 02209) = оа, оу) = oy? (8.10.11) 


8.11 REGRESSION CURVES 


In another terminology, the conditional mean m,(x), for a continuous 
distribution, is called the regression function of Y on X and the curve 
y=m,(x) (8.11.1) 


170 MATHEMATICAL EXPECTATIONS II [ 8.11 


the regression curve of Ү on X or sometimes thie regression curve for the 
mean of Y. (Regression is a peculiar word which has come into use 
whose literal meaning has very little to do with its mathematical 
definition !) Geometrically, the regression function m,(x) represents 
the y-co-ordinate of the centre of mass of the bivariate probability mass 
in the infinitesimal vertical strip bounded by x and x+dx, which 
follows readily from (8.10.8), and hence the regression curve of Y on 
X is the locus of this centre of mass as x varies. 


Similarly, the regression function of X on Y is m,(y), and the 
regression curve of X on Y is given by 
, x=m,(y) (8.11.2) 
Thus equations (8.11.1) and (8.11.2) give the two regression curves 
of a continuous bivariate distribution. 
In case a regression curve is a straight line, the corresponding 


regression is said to be linear. If one of the regressions is linear, it 
does not, however, follow that the other is also linear, 


Remark. We can also develop the idea of regression curves for 
the mean for discrete distributions, but that is relatively unimportant. 
In the discrete case, the analogue of the regression curve of Y on X 
will not be a continuous curve but а disconnected set of points, viz. 
(х, my) ({=0, +1, +2,...) ; we may, if we like, connect the consecu- 
tive points by straight lines for convenience. 


1. The expectation of the regression function of Y on X treated 


as а random variable, i.e. of m,(X) is readily obtained from (8.10.8) 
which gives 


EMO | moo dem | | ула, ахау 


or 


E{m,(X)} =m, (8.11.3) 
Hence 
o*{m,(X)} = Elfm,(X) – mj] (8.11.4) 


This gives a measure of deviation of the regression curve y=m,(x) from 
the horizontal line у= ту. 


341] REGRESSION CURVES 171 


2. We know that the conditional variance c, (X) isa measure of 
dispersion of the conditional distribution of Y on the hypothesis X =, 
and hence that of the two-dimensional mass distribution lying in the 
strip between x and x+dx about the conditional mean m,(x), for a 
fixed value of x. Let us now see if Ef{cy*(X)} also reduces to cy? 
or not. 


E= | oo) dx | | рт) Ло 9) ахду 
ог 
E}oy2(X)} = EHY - т„(Х)*] (8.11.5). 


which is, in general, different from оу? and gives a measure of disper- 
sion of the bivariate distribution about the regression curve у= т,(х). 
This is called the variance of Y about the regression function of Yon X 
and denoted by cyz”, i.e. 
yx” = ЕЦУ-т.(Х)] (8.11.6) 
We define ozy? similarly. 


3. Minimum property. An immediate consequence of the fact 
ihat the conditional second moment is minimum when taken about 
the corresponding conditional mean isthefollowing important theorem : 
For any continuous function g(x), ЕЦУ - g(X)}*] is minimum when 


g(x) =m,(x). 


| о-в) dxay 


-=æ 


Proof. EY - 20011 


Ш 
3B 


co oo 


| лозах | inserta de 


x E 


ос 


| L) dx zt - solent 


-2 


Ш 


Now the expectation within the integral represents the conditional 
second moment of Y on the hypothesis Х= х about the point a(x), 


172 MATHEMATICAL EXPECTATIONS II [8.12 


which, we know, is minimum when 8(х)= т,(х), and hence the 
theorem follows. 


Fig. 18 


The geometrical interpretation of the above theorem reveals the 
following minimum Property of the regression curves, Here y= g(x) 
represents any continuous curve, and E[{Y — g(X)}*] is the mean value 
of the square of the deviation of the distribution from the curve 
у= а(х) measured in the direction of the y-axis, which is thus a 
measure of dispersion about the curve у= а(х). Now the theorem 
states that among all continuous curves this mean value is minimum 
for the regression curve y=m,(x). Hence among all continuous 
Curves, the one which minimises the mean value of the square of the 
deviation in the direction of the y-axis is the regression curve of Y 
on x. 

4. If X and y are independent, it follows from (8.10.11) that both 
the regressions : Y on X and X on y are linear, the regression curves. 
being y=m, and x =m, Which are straight lines parallel to the axes, 


Examples 


1. Find the regression curves for the mean in the example of 
Sec. 6.3. 


1-z 


2 ydx=6 | X0 -x- y) dx- (125): 
=25 4 
So 
т„(х) =1(1- x) (0<х<1) 


8.12] LEAST SQUARE REGRESSION CURVES 173 


Hence the regression curve for the mean of Y is 


y=4(1-x) (0 <х<1) 
Similarly the regression curve for the mean of X is 
x=4(1-y) (0<у<1) 


2. BIVARIATE NORMAL DISTRIBUTION. From Ex. 3 Sec. 6.5 
we have 


m,(x) =m, +p БЫ (х- т.) 
z 


Hence the regression curve of Y on X is 


С, 
у= ту+р =! (х-т,) 
Ox 


or 


YrMy „т 
Oy p Oz 
Similarly, the regression curve of X on Y is 
y-my lx-m; 
ov P Ox 
Thus for the bivariate normal distribution both the regressions are 
linear. 


3.12 LEAST SQUARE REGRESSION CURVES 


By dilating the minimum property only, we can introduce a very 
general and useful concept of regression curves known as least square 
regression curves. The principle of least square is, however, a broad 
mathematical principle which, in this case, may be precisely stated as 
follows. Let 


у=8(%; Cos Crs...) (8.12.1) 
be a family of curves, Co, c,,...... being the parameters of the family. 
The principle of least squares consists in minimising the mean value 

S = EY - &(X ; со, с,,...)}%] (8.12.2) 
which is a function of the parameters co, c;,...... and which gives a 
measure of dispersion of the probability mass distribution about the 


174 ý MATHEMATICAL EXPECTATIONS II [8.12 


curve (8.12.1). IfS is minimum for Comton Са Toon » then the 
curve 

gss(x 3 eg* e, * ..) (8.12.3) 
is said to be the best-fitting curve of the family (8.12.1) to the dis- 
tribution according to the principle of least squares and will be 
called the the least square regression curve of Ү on X belonging to 
the given family, The function &(X ; co*, c,*....) is called the least 
Square regression function of Y on X, and the corresponding random 
variable . 
U,-8(X ; co*,c,*....) (8.12.4) 
the best representation of Y by a function of X of the family 
SX Co, C...) according to the least Square principle. The variate 


Vy-Y- 0, (8.12.5) 


Which is the part of y left after taking away its best representation 
is called the residua] of Y. We note 


Smin = ELLY - &(x ; €o*, €,*,...)2] = E(V,?) (8.12.6) 


Clearly, Smin is a measure of dispersion about the regression curve 
(8.12.3) and hence an inverse measure of Eoodness of fit of the 
regression curve to the probability distribution. 


The equations for minimising S are 


25 os 

3c, 70 3c, 7 — (8.12.7) 
Which are called the normal equations. By solving these equations 
we get the least square values ede. Lei of the parameters Соу €4...... 
respectively. 


Similar formulations hold for the least square Iegression curves 
of X on Y. If, however, we consider the family of a// continuous 
curves, if follows from the last Section that for a continuous distri- 
bution the least square Tegression curve of y on X turns out to be 
the regression curve for the mean of y. Thus we see that the 
regression for the mean may be Obtained as a particular case of 
least square regression. 


8.13 ] REGRESSION LINES 175 


The most important of the least square regression curves are the 
regression lines, although other types of curves are also sometimes 
used. We shall, for convenience, often omit the phrase least square 
qualifying a regression curve which will be implied by the context, 


8.13 REGRESSION LINES 
Here we are concerned with the family of straight lines 


y-7CotCX (8.13.1) 
so that 
5= ЕУ -со- с, Х)"} (8.13.2) 
The normal equations аге 55 =0 and 25 =0 which, on putting 
Co =Co*, с, — c,*, reduce to 
E(Y- c,*-c,*X)-0 (8.13.3) 
and 
EÀX(Y - co *- c,*X)) 20 . (8.13.4) 
or 


Co" + сут. = ту 
Co* ms 4- C4" ass 7233 


Solving these we get 
Cot =m, — p?! m, су*=р°” (8.13.5) 
Ox Ox 
Therefore, the regression line of Y on X is 
yec* e, *x em, por (x- m) (8.13.6) 
= 
The coefficient of х, с po is called the regression coefficient 
© 
of Y on X and denoted by yz, i.e. 
[d (8.13.7) 
Cx 


Equation (8.13.6) may also by written in the form 


у-т,_  x-m. 
Oy Ox 


(8.13.8) 


176 MATHEMATICAL EXPECTATIONS II [8.13 


We have 
2 [4 a 
(Y- Co*— GFX) kt [v-m, -p72 (х- ma) 
Cx 
=(¥=m,)* + p° 2, (X—m,)* — 2p 9! (x- m - m,) 
ба Oz 
So 
2 
EY -c,* - e, )*] = оу? +p? bur Gs. —2p 21, Ро2бу 
Oz Oz 
or 
E\(¥ -co - ¢,*X)*} = oy3(1 - p°) (8.13.9) 
Similarly, the regression line of X on Y is 
x=do*+d,*y 
where 
do*=m,-pm,, d,*-p2* (8.13.10) 
Oy Oy 
or 
yom, 1 x-m, (8.13.11) 
су P Oz 
The regression coefficient of X on Y, 
Bey =P = (8.13.12) 
у 
апа 
Е(Х-4ь*—-4,*үҮ)%{=в»°(1— p?) (8.13.13) 


1. Significance of р. We remarked earlier that if p=0, X and 
Ү are not necessarily independent, but if p=+1, we can conclude 
that Y is a linear function of X. The latter result becomes obvious 
from (8.13.9) or (8.13.13). If p=+1, the regression lines (8.13.8) 
and (8.13.11) coincide, and it follows that 


С; 
Y-c,**c,*X- my (X-m,) 
Ed 


which is a linear function of X, ie. the whole probability mass is 
confined on the coincident regression lines. We shall now show 
that |p|, in fact, gives a measure of linear dependence of X and Y. 


8.13 ] REGRESSION LINES 177 


From (8.13.9) or (8.13.13) we get 0<|p|<1, and the left-hand 
sides of (8.13.9) and (8.13.13), which are measures of dispersion 
about the corresponding regression lines, are both proportional 
to 1—p?. This shows that |p| is a measure of concentration of the 
probability mass about the regression lines which are the best- 
fitting lines to the distribution, or, in other words, | p| is a measure 
of linear dependence of X and Y. We may also say that lel isa 
measure of goodness of fit of the regression lines to the distribu- 
tion. Moreover, it is a very satisfactory measure in view of the 
fact that the correlation coefficient is dimensionless and independent 
of the choice of origins and scales of measurements, 


f" e <o 


Fig. 19. Least Square Regression Lines 


2. When p=0, the regression lines (8.13.8) and (8.13.11) respec- 
tively reduce to y =m, and x=m,. For p > 0, both the regression lines 
have positive slopes, while for p < 0 they have negative slopes. 


3.$ The best representation of Y, 


U, 7 c,* + c, Xe m, p?" (Х-т.) (8.13.14) 
2 
Непсе 
E(U,)=m,, (0) = lel oy (8.13.15) 
and 
e(U,, Ү)=|р| 20 (8.13.16) 


Thus we may say that the correlation coefficient between Y and its 
best representation is a measure of goodness of fit of the regression 
lines to the probability distribution. 


12 


178 MATHEMATICAL EXPECTATIONS IL [ 8.13 


4. The residual 


И V,-Y-U,-Y-c,*-c,* X $3 (8:15:17) 
Now the normal equations state 
E(v,)=0,  E(Xv,)-0 (8.13.18) 
By (8.13.9) 
o*(V,) = E(Vy*) = o,*(1 - p?) (8.13.19) 
or using (8.13.15) 
c*(V5) = oy? — o°(U,) (8.13.20) 


This expresses that the variance of the residual of Y is the amount 
of the variance of Y left after subtraction of the variance of its 
best representation; in this sense c?(V,) is sometimes called the 
residual variance of Y. Yf-follows from (8.13.19) that the residual 
variance g*(V,) is a measure of dispersion about the regression 
line of Y on X. 


From (8.13.18) cov (X, Vy)=£{(X-m,)V,}=0 so that 
A(X, V,)=0 (8.13.21) 
Also then 
0, V,) - 0 (8.13.22) 


5. If the regression curve for the mean of Y, y=m,(x) happens 
to be a straight line, then it must be identical with the least square 
regression line of Y on X, for while competing among all possible 
continuous curves, the straight line y = т,(х) is the minimising curve, 
and hence it must also be the minimising curve while competing 
in its own family of straight lines. 


Remark. The idea of least square fitting in the case of a one- 
dimensional distribution, although trivial, will be interesting to note. 
The problem here will reduce to fitting a point c to the distribution 
of X by minimising E{(X-c)°}; the normal equation is E(X- 0 - 0 
and, therefore, c* = E(X) - m. Thus the mean of any distribution 


is best-fitting point to the distribution according to the principle 
of least squares. 


8.14] PARABOLIC CURVE FITTING 179 


Examples 
1. Find the regression lines in Ex.1 Sec. 6.2. From Ex. 1 
Sec. 8.3. 


Вуз = Hilos? = — i, Bay = ttsiloy? = — d 
Hence the regression lines are 
y-328=-3(0-) , (Y on X) 
х-5% —4&(y-) (X on Y) 


2. Find the regression lines in the example of Sec. 6.3. 


Here = 
M= y=}, o22 =су= фу, p--4 
so that f,.—58.,— – +. Hence the regression line of Y on X and 
that of X on Y are respectively 
у-&=-%(- 4), x-i--$0-1) 
or : 
y-$(1-x, | x-$(-)5) 

It appears in Ex. 1 Sec. 8.11 that these regressions lines are the 
same as the corresponding regression curves for the means. In fact, 
in this case both the regression curves for the means are straight lines 
and hence must coincide with the regression lines. 


8.14. PARABOLIC CURVE FITTING da d 
For the regression of Y on X, we consider the family of kth degree 
parabolas 


Y=Co HCX H CX? ++ + сухе (8.14.1) 
as Bias онол с. being the (k +1) parameters of the family. Set 
S-E(Y-co-c,X-co4X* — v — СХ") (8.14.2) 
The normal equations are 
25 oS 25 
ao =0› cim Opeens Fa” 0 


Hence, for Co =C9*, Cy 0,5, Cy = сп, we have 
UEFA M Е 
FAX(Y - Co”  c,*X- c,* X? - 00071 (8.14.3) 


sisiti-a, Cot = TX = gtx? - S ae. c,* Ху} = 


189 MATHEMATICAL EXPECTATIONS II [8.14 


In terms of the moments ар, these reduce to 


Co*aoo + C3 *a1o + Ca*ago + + Cx" ako = G02 
Co*aio + C1 "ago + Ca ago n + с*ак+а,о аз (8.14.4) 
Соар + бу*ак+та,о t Ca apego HUUU + суа зто = ава 
These equations give Co”, c,*,...... су. The best-fitting parabola of 
degree k is then 
Y=Co 4 e, x + СХ Horror CXF (8.14.5) 
and the best representation of Y by a kth degree polynomial in X is 
Uy =Co*® + с X 0S X2 nne Cr” X* (8.14.6) 
The residual is given by А 
V,-Y-Uy-Y-c-06*X- 06 XE (8.14.7) 
The first normal equation states : 
EV,)=0 (8.14.8) 
Hence 
o? (Vy) = E(V,?) = S min (8.14.9) 


Now we can easily calculate Smin by the following tricky use of 
the normal equations. Write 


S-E(kY -c 7c, X7 C,.X*)*} 


where k=1, so that S becomes a homogeneous function of k, co; 
Cave eer cy of degree 2. Hence 


2S = -k 28 5 со оо, Я +25 
and 
араа a es 
^ S 
ia ia lies Co Co* ...... Cy Cu" 
_ oS 
“Ok edi COP Co у... ср ср 
=2Е\Ү(У-со* = с, X- ee emt XO} 
or 


Sin = аоз —Co*ao1— Cy *a44 tt — Cy aka (8.14.10) 


8.14] PARABOLIC CURVE FITTING 181 


We know that Smin may be taken to be an inverse measure of 
goodness of fit of the^regression parabola (8.14.5) to the probability 
distribution ; but this measure has the obvious defect of being not 
dimensionless, and also its range of variation is unknown. For the 
regression lines, however, it was found that the numerical value 
of the correlation coefficient furnishes a dimensionless measure of 
fit, and that 0< |p| <1. In order.to obtain such a satisfactory 
measure for polynomial regression also, we first reduce it to a 
case of linear regression by the transformation 

Uy =Co® e, X+ ene + с,* ХЕ (8.14.11) 


Consider now the joint distribution of the variates U, and у 
which is uniquely determined by the given bivariate distribution 
of X, Y, and let us find the regression line of Y on Uy. For this, 
we consider family of of straight lines 


у=а+ ш 
in the (иу, »)-plane and minimise 
E\(y —a—-bU,)*t = ЕЦҮ- (a+ beo*) - bc, * X — ...... - bey ХЗ] 


Since the R.H.S. is of the form (8.14.2), itis obviously minimum for 
a —0, b=1, so that the regression line of Y on Uy is у= шу. 


Therefore, the best representation of Y by a linear fuuction of U, 
is U,, and the corresponding residual of Y is Y- Uy - Vy. 
If follows from (8.13.16) that p(U,, Y) > 0, and hence 


0c р(0,, Y) £1 (8.14.12) 
By (8.13.19) 
o°(Vy) = eM - p° QU, Ү)} (8.14.13) 
which at once shows that the correlation cofficient p(U,, Y), whose 
limits are given by (8.14.12), is the required dimensionless measure of 
goodness of fit of the regression parabola of Y on X to the distribution. 
If also follows from (8.13.15), (8.13.20), (8.12.22) that 
E(Uy) ^ ty, o(Uy) = (Uy, Y)ey 
(8.14.14) 
o° (Vy) = ey? - e* (Uy), p(Uy, V,)=0 
where U, and Vy are given by (8.14.6) and (8.14.7) respectively, 


182 MATHEMATICAL EXPECTATIONS II [8.15 


8.15 CORRELATION RATIO 
Returning to the topic of regression for the mean, we may also 
be interested in constructing a dimensionless measure of goodness 
of fit. of the regression curve, say, for the mean of Y, y=7,(x). 
We know that the problem of regression for the mean can also 
be considered as a least square regression problem corresponding 
to the family of all possible continuous curves, and, by following 
a method similar to that in the last section, we reduce this problem 
to one of least square regression lines by setting 

Uy =m,(X) (8.15.1) 


which is evidently the best representation of Y by any continuous 
function of X. 


It is easy to see that, for the joint distribution of U,' and Y, 
the least square regression line of Y on Uy’ is y=u,', for 
E((Y -a- bU,)*| = EllY -a - bm,(X))*1 
is certainly minimum when a=0, b=1. 


Hence the best representation of Y by a linear function of Uy’ is 
simply U,, and the residual of Y is Y- U,'-V,' (say. We note 
that vy,’ may also be interpreted as the residual of Y corresponding 
to its best representation by any continuous function of X. 


From (8.13.18) E(V,')=0, and hence by (8.11.6) 
e*(V,) = E(Vy'*) = EY - m,(X)}71 = oye" 
By (8.13.16) and (8.13.19) і 
0 < ріт(Х), У <i (8.15.2) 
апа 
EGY —m,(X)}2] =oy°Ll — рт), УЙ] (8.15.3) 
This shows that р{т,(Х), Y] is a measure of goodness of fit of the 
regression curve y=m,(x) to the distribution, which is dimensionless 
and non-negative. This correlation coefficient between the regression 


function of Y on X (treated as a random variable) and Y is called 
the correlation ratio of Y on X and denoted by nyx i.e. 


Nyx 7 Pim (X), Y} (8.15.4) 


8.16] CORRELATION RATIO 183 


By (8.15.2) and (8.15.3) 
05,1 (8.15.5) 
Oyz? = oy (1 — Nyx) i (8.15.6) 
Also (8.13.15), (8.13.20), (8.13.22) will hold if we replace U, and 
V, by U,/, and V,’ respectively, i.e. 
Em,(X)em; ofM (X); = т oy 
(8.15.7) 
Gyz? = oy” — о°{т„(Х)\, pjm,(X), Y- m,(X)}=0 
1. If .=1, cy, = 0, and hence Y=m,(X), i.c. all the probability 


mass is situated on the regression curve y=m,(x). 


2. If »-0, ofm,(X)}=0, hence тх) = т, i.e. the regtession 
curve for the mean of У is the straight line y=m,, which must then 
be identical with the regression line (8.13.8). Comparing these two 
we get fus=0 or p=0. Thus n,,=0 implies p=0. 


9. In case the regression function m,(x) is linear, we must 
have m,(x) = со* + c, *x, so that from (8.15.4) sy. = |P]. 


Example. Calculate cy,* and nyz in the example of Sec. 6.3. 


From Ex. 1 Sec. 8.11 


1 1-z 
on? =6 | | 10-310 -x- pds dy= 
0 0 


Since су? = 185, Бу (8.15.6) 4,3. 


In fact, here the regression function m,(x) is linear, and so by 
Observation 3 above yy2=|p| =}. 


8.16 EXERCISES 


1, Two points X, Yare independently chosen at random on a line segment 
of length a. Compute the expectation of | X- Y |". 

2. If (X, Y) has the normal distribution in two dimensions with zero means, 
unit variances and correlation coefficient p, then prove that the expectation of 
the greater of X and Y is ,/(1—p)jz. 


184 MATHEMATICAL EXPECTATIONS II [ 8.16 


з. If X. Х,,...Х, are mutually independent standard normal variates, then 
show that the mean value of min ( | X, |, |, |,... 1.1) is 


2» | [1-909] dx 
E 0 
where 4(x) denotes the standard normal distribution function. 


4. A ball is drawn at random from an urn containing 3 white balls num- 
bered 0, 1, 2, 2 red balls numbered 0, 1 and 1 black ball numbered 0. If the 
colours white, red and black are again numbered 0, 1 and 2 respectively, find 
the correlation coefficient between the variates— X, the colour number and Y, 


the number of the ball. Also write down the least square regression lines 
of Y on X and X on Y. 


5. In Ex. 1 Sec. 6.8 find the regression lines of the joint distribution of the 
random variables—the number of the ball and the colour number, and obtain 
a measure of goodness of fit, 


6. Let X and Y respectively denote the number of heads and the longest 


run of heads in four tosses of a coin. Compute the means, variances and the 
correlation coefficient. 


7. When two dice are thrown, X denotes the number on the first die and Y 


the greater of the two numbers on the dice, Compute the correlation coefficient 
between Х and Y. 


8. Calculate m,, т„ and a,, for the joint distribution of the number of 
the ball and that of the colour in Ex. 2 Sec. 6.8. Hence find the covariance 
of the variates, and explain the value obtained. 


9. A ball is drawn from an urn containing 3 balls marked with numbers 0, 
1, 2 and having white, red and black colours respectively. If we call white, red 
and black the first, second and third colour respectively, show that the corre- 
lation coefficient between Ж, the number of the colour and Y, that of the ball 
is unity, and hence obtain Y as a function of X. 


10. The probability density function of a continuous bivariate distribution is 
given by 
fx, у)=х+у for 0<x<1, 0<у<1 
=0 elsewhere 
Find the values of m,, m,, oz, с, р and write down the regression lines, Also 
find the regression curves for the means. 


14. For the continuous bivariate distribution defined in Ex. 8. Sec. 6.8, find 


the regression curves for the means, and also the the least square regression 
lines, 


8.18] EXERCISES 185 


1^. Show that the acute angle д between the least square regression lines 
is given by 


1-р? oce 
tan ф = rey 


and discuss the cases р=0 and p= +1. 
13. If the regression lines are 
x+6y=6 and 3x+2y=10 
find the means and the correlation coefficient. 


14, If points are assigned to the different suits of cards as follows: 1 for 
spade, 2 for heart, 3 for diamond and 4 for club, find the mathematical 
expectation of the total number of points in a bridge hand of 13 cards. 

15. Two points are independently chosen at random on two adjacent sides of a 
square, the length of a side being a. Find the mean area of the triangle formed 
by the line joining the two random points and the sides of the square. 

16. Prove Schwartz's inequality for expectations that [E(X Y)]? < E(X *) E(Y?), 
and hence deduce that -1 < p(X, Y) <1. К 

17. If a, b, c are positive constants, show that the correlation coefficient 
between aX+bY and cY is 
apo. boy 
^/а?т„ + b?ey? + 2ab росу 


where p=p(X, Y). 

18. If X and Ү are uncorrelated, find the correlation coefficient between 
the linear combinations а, X-- b, Y and a, X +b, Y. 

19. Prove that the linear combinations aX--bY and cX--dY of the random 
variables X, Y are uncorrelated if асо,? +(ad+ be)pa,0y+ Багу = 0. 

20. The random variables X, Y are connected by the linear relation 
aX+bY+c=0. Prove that the correlation coefficient between X and Y is -1 
if a, b have the same sign and 1 if a, b have opposite signs. 

21. In the theorem of Sec. 8.3 show that if U, V are uncorrelated, then 
сц? oy? = 0,7 toy?» 940,7 усу J 1—p?+ ч 

22. If for any pair of linearly dependent random variables X, Y we set 

U=X cosa+Ysino, V=-X sin a+ Y COS a (a, a constant) 


then prove that V will be constant (i.e. has a one-point distribution) if 
tan a=puylore 
23, If the joint distribution of X and Y is the bivariate normal distri- 


bution, then show that 


186 MATHEMATICAL EXPECTATIONS II [8.16 


are independent normal variates having variances 2(1+ ) and 2(1— p) respectively 
where p- (CX. Y). 


24. Show that the mean and variance of the number of successes in a Poisson 


sequence of n trials are Ур; and p,q; respectively, where p; denotes the probability 
of success in the ith trial and g,- 1 - p, (і= 1, 2,...п). 


95. Find the mean and variance of the total number of aces, kings, queens 
and jacks obtained by a bridge hand. 

26. An urn contains N tickets numbered 1,2,..N, from which m tickets 
are drawn successively without replacement. Find the mean and variance of 
the sum of the numbers on the tickets drawn. 


27. In Ex.5 Sec, 32 show that the mean and variance of the number of 
matches are both unity. 


28. If the mutually independent random variables Х,, Х,,...... X, all have 
the same distribution and their sum X,+X,+-+++-X, is normally distributed, 
then show that each of them is normally distributed. 


29. Let Х,, X,......X4, be mutually independent random variables 


having cumulants «;,*), x, ?,...;? respectively. Prove that the linear combina- 
tion X=4, X, +4, X,+ eee +a,X, has cumulants x, given by 


ky FU Ha куфа дку" 
Hence deduce that the mean m and variance c? of X are given by 
m-a,m,-ra,m,- «Fam сауса? H- + 0,0, 
where т; and с? respectively denote the mean and variance of X; (i=1, 2,...п). 


30. A gambler plays a game of chance with his opponent, in which the pro- 
bability of his win is 4 at a stake of 1 rupee. If he starts with a capital 
of 5 rupzes, find the probability that the gambler will just completely ruin his 
capital after 9 games. (Assume that if the capital of the gambler is exhausted 
before 9 games are played, he can be given loan so that the play is not stopped.) 


81. The joint probability distribution of two discrete random variables X, 
Yis given by 


P(X-i, Ү=])=р‹, (i, j=0, 1) 


Find the joint characteristic function of Х and Y and their individual charac- 
teristic functions, and using these prove that Ж, Y are independent if 


PooP11=PoiPio 
82. Prove the formula 
e*(V,) - EXV,) 
for the least square lincar regression of Y on X. Deduce that 


9*(V,) = ао; – Cotto: —C, fais 


8.16 } EXERCISES 187 


33. Fit a parabola of the form y=cg+¢,x+C,x* to the joint distribution 
of X and Ү defined in Ex. 4 by the principle of least squares, and find a 
measure of goodness of fit. 

34. Find the least square regression parabola of the second degree of Y 
on X for the bivariate normal distribution with zero means, and account for 
your result. : 

35. The joint probability density function of X and Y is àx'e-''*? 
(0«x«o, 0 <у<>). Determine the correlation ratio of Y on X. 

36. For any continuous function g(.X), prove that 

p {g(X), ¥-m,(X)}=0 
Use this fact to show that 
ЕҢт,(Х)-сә*-—с,* XP] = оу Q2 — 07) 
Hence deduce that ny- 2 |p| and that 5,? – 9? is a measure of separation between 
the regression curve for the mean of Y and the corresponding regression line. 
Interpret the case ту: = lel. 


CHAPTER © 
SPECIAL DISTRIBUTIONS 


In this chapter we shall study three continuous distributions, viz. 
the x?, t and F-distributions which are particularly important for 
their applications in statistics. Now it is customary in statistics to 
denote the random variable associated with these distributions by the 
same letter as the distribution itself. We shall also initiate this 
practice here, and speak, for example, of a random variable X? having 
а X?-distribution, the corresponding running real variable being again. 
denoted by x? in keeping with our usual convention. 


9.1 x?-DISTRIBUTION 


The spectrum consists of the positive half of the real axis, and the 
density function is defined by 
e- a (%х?)»з-1 


2) — Perm 2 
foc) 2r Gn) x20 (9.1.1) 
-0 x*«0 

п, the only parameter of the distribution, is a positive integer called 
the number of degrees of freedom of the distribution. A x?-distribution 
with п: degrees of freedom will be briefly referred to as а x?(") 
distribution, 

The above form of the density function reminds one of the 
gamma distribution, and, in fact, we have the following theorem. 

Theorem I. If X is a y(4n) variate, then Y 2X has a x?-distribu- 
tion with п degrees of freedom, and conversely if y is a x?(n) variate, 
then X is a у(їп) variate. 

Proof. The probability differential 


-zyn2-1 -yl j/2-1 - у 2-1 
dpa SEE, en yA, ecin yy 
rd (7 —rüm 40) adm —? А 
(0<у<*) 


which shows that У has a хэ(и) distribution. The proof of the 
converse will be similar, 


3.1] А X'-DISTRIBUTION 189 


The next theorem shows how the x?-distribution arises naturally 
from the normal distribution. 


Theorem П. If X,, Xs,...... AX, are n mutually independent standard 
normal variates, then the sum of their squares X,* + Xa? +++: + X4? is 
x*-distributed with n degrees of freedom, 

Proof. Since each X; is normal (0, 1), 2X;? is a y(4) variate 
(cf. Ex. 3 Sec. 5.9. Now iX,*, +Х.*,...... 1X,? are mutually 
independent »-variates each with parameter 4, so that, by the reproduc- 
tive property of the »-distribution, their sum 4(X,°+ Xa? +++ Х„°) 
is a y(n) variate, and hence, by Theorem І, X,? + X,? + eee + An? 
is x?-distributed with n degrees of freedom. 

Theorem III. Let X;, Xose... X, be n mutually independent 
standard normal variates. If 

а Xt AyoXat errr назъ Хь 
аз: X, b das Xo b тее T sn Xn (9.1.2) 


ауХу+амаЖа+ ens + атп Хп 


be m(<n) given linear combinations such that their coefficients satisfy 
the orthogonality relations : - 


n 


Da ja  Ójj (i, j=1, 2,...m) *7(9:1:93 


a=1 


where §,; is the well-known Kronecker symbol, i.e. 


e iz 
ô= ij 


then the quadratic form Q=Q(X1, X25... Xn) given by 


n m 
Q= 2 = Деса, еар)" OLA 
° Sr = 


is x2-distributed with n —m degrees of freedom, and Q is independent 
of the given linear combinations. 


Let us first prove the following lemma. 


190 SPECIAL DISTRIBUTIONS [9.1 


LEMMA. If Xi, Xos... AX, are mutually independent standard 
normal variates, and Y,, Ys,...... Y, are obtained by an orthogonal. 
homogeneous linear transformation : 


n 


Yi EA dia Xa GERD (9.1.5) 


a-l 


5 : 
Dl tiaa > es dajz 12) 0449 
а=1 a-l 


then Y;, Ys...... Y, are also mutually independent standard normal 
variates, 


Proof. Set 


where 


nn 
i= > Mia Xa (i - 12,...5) 
а=1 


If follows immediately from the orthogonality relations that 


n 


Ж. э 2 METTETE 
Dn p ЕД ORE y) 1 


Since X,, *. — S so mutually independent each normal (0, 1), 
the probability differential 
аР ух X UE AEN oes ахаха 
pec Quee) dy. dya.. dyn 
-i it dy,dys...dYn 
- е" ie ee s pst dysdys...dys 


This from of the probability differential proves the lemma. ' 


Proof of the theorem. We know from the theory of matrices 
that given m xn coefficients а; (i71, 2,...m ; j=1, 2,. Jn of (9.1.2) 
Subject to (9.1.3), we can always detaining the rest of the n? numbers 


9.1) X'-DISTRIBUTION 191 


dij (^, j=1, 2,...n) such that the complete set of orthogonality relations 
(9.1.6) holds giving an orthogonal transformation (9.1.5). Of these 
Y's, we note, Yi, Y.,...... Ym are the given linear combinations of 
(9.1.2). 


a n 
Мом >; Xe = D Ya, and hence 
а=1 act 


zi m 
Q= 21 Yer > Ye? = Ving? Yuan Yq? 
gel В=1 


Since, by the lemma, Y-, Y.,...... Y, are mutually independent 
standard normal variates, it follows from Theorem III (a) Sec. 6.7 that 
Т ГРИ Yn аге п- т mutually independent standard normal 


Example. Let the Cartesian co-ordinates (X, У, Z) of a random 
point in space be mutually independent, each of which is normal (0, 1). 
By Theorem II, the square of the distance of the random point from 
the origin, X? + Y? + Z? is x?-distributed with 3 degrees of freedom. 
Now consider any plane /x+my+nz=0 (1° m? +п° = 1) passing 
through the origin. The square of the distance from the origin of 
the foot of the perpendicular from the random point to the 
plane is X^-- Y* - Z? -(UQ -mY 4 nZ)* ‘which, by Theorem III, is 
X?-distributed with 2 degrees of freedom. This example might give a 
glimpse into the meaning of the term ‘degrees of freedom’. 


Characteristics of the X?-distribution 
We know, if X has a ;(im) distribution, x?-2X has a 
X?-distribution with п degreees of freedom. Hence the moments of 
the х°(п) distribution are given by 
ay = 2a, (X) = 24п(1п + D) (1n  2)----- (3n k - 1) 
or 
а= n(n + 2)(п + 4)...(n- 2k - 2) (9.1.7) 


192 SPECIAL DISTRIBUTIONS [9.9 


Hence mean m= a4 =, a9 — n(n +2) and variance c? =n(n + 2) - п? =2п 
еїс. 

For n< 2, the density function f(x?) is monotonic decreasing in 

0 <x < c, and as such the distribution has no mode ; but for п> 2, 


n—2 
Fig. 20. Typical x?-Density Curve 
f(x?) has a single maximum at х®=п—2, ie. the distribution is 
unimodal with mode M «n ~ 2. 
The characteristic function 
X(t) = х„(21) = (1 — 200)" (9.1.8) 
This form of the characteristic function at once suggests the following 
reproductive property of the x?-distribution. If X4’, X;?...... Ху? are 
mutually independent x?-variates with degrees of freedom ni, ngs... Mi 
respectively, then their sum Х,2 + Х2°......Хь° is also x?-distributed 
with п, 4 n3 + +++ ++ т degrees of freedom. 


9.2 t-DISTRIBUTION 
The f-distribution or Student's distribution (Student was the 
pseudonym of a statistician W. S. Gosset) is given by 


= <1< ә) (9.2.1) 


= 1 = 
ШО) = JnB(, ny + 7n) +? ( 
where the parameter m, as in the x?-distribution, is а positive 
integer called the number of degrees of freedom of the distribution. 


Theorem І, If X isa standard normal variate, X? is x2-distributed 
with п degrees of freedom, and X and x? are independent, then 
о eg аа 


Жет “у 


has a t-distribution with n degrees of freedom. 


9.2] t-DISTRIBUTION 193 
yt 
Proof. We write a TD 


Since X and X? are independent, + X? and 4x? are also independent, 
and both are y-variates, the former with parameter + and the latter 
with parameter im, and hence their quotient Y*/n is a p,(4, in) 
variate (cf. Ex. 4 Sec. 6.6), so that the probability differential 


(ул) 12 2 
oF Bs, 3n)(1 + у*/пут+1% ао?) 


= 2 а 
An B, їп(1 + уату CY 


Now as y ranges from— to œ, y?/n traverses. the interval 
(0, œ) twice, and hence 
1 
WO" TRG ута С=<у<=) 
Hence the theorem, 


Characteristics of the ¢-distribution 


The ¢-distribution is symmetrical about the origin. For n=1, 
A(t) ^ 1/a(1-- t?) which is, in fact, the density function of the Cauchy 
distribution with parameters 4—1 and и=0; for this distribution, 


о 
Fig. 21. t-Density Curve 


we know, the mean does not exist. For п > 1, however, the mean of 
the t-distribution exists and, on account of symmetry, the mean, mode 


and the median are all zero. 


13 


194 SPECIAL DISTRIBUTIONS [9.9 


9.3 F-DISTRIBUTION 


Here 
тт! 012 gnf2-1 


Kp- B(&m, àn)(mF + п) m+ 


E>) 


(9.3.1) 
=0 F<0 


where т, п, both positive integers, are the two parameters of the 
distribution. The random variable will be called an F(m, п) variate. 
„Же note that the distribution is not symmetrical with respect to the 
parameters m and n. 
Theorem I. If X,? and x,? are independent variates having 
x?-distribution with m and п degrees of freedom respectively, then 


X12 /m _ пху? 
х= xus mR 


is an F(m, п) variate. 
Proof. Write” "х= 
Now 4X,* and iX;* are independent y(im) and (in) variates 


Tespectively, and hence TX has a (3m, in) distribution. Hence 


| (mx[n)nP-1 ; 
с B(ym, їп) (1+ mx [ny nw? dlman) 


трт? -1 


| = Віт, зп) (тх + ny mv dx Оаа) 


which proves the theorem. 
Theorem II. If F is an F(m, п) variate, then X=1/F is an Fu, my 
variate, > 
Proof Setting x=1/F, the probability differential 
ттр? Enfa-1 
Bim, їп)(тЁ + ny ww 
тт? Em-1 
“BG т, 4n)(mF + пут)? | dx 
L0 many- 
~ Bn, imy(nx + тутп}? 
Hence the theorem. 


dF 


ap 


dx (0 <х< ә) 


9.4] EXERCISES 195 
Characteristics of the F-distribution 3 
It may be easily seen that the mean exists only for n> 2, and 
its value is 3-2 which is independent of the parameter m and 
is greater than 1. 


For m> 2, the distribution" has a unique mode at the point 


n(m — 2) 
m(n +2) ssh 


1 
Fig. 22, Typical F-Density Curve 


9.4 EXERCISES 

1. If X,, X, X, are mutually independent normal variates each having 
mean zero and standard deviation 7, find the distribution of the sum of their 
Squares, 

2. Assume that the three velocity components (Vz; Vi» V.) of any molecule 
of a gas are mutually independent random variables, each being normal (0, NkT[m) 
where k is Boltzmann's constant, m the mass of a molecule and T the absolute 
temperature ofthe gas, Prove that magnitude of the velocity V has the Maxwell- 
Boltzmann probability density function 

ay*e- Bv? (0 <v <a) 


2{т\з т 
om y/2 (Er) "А BESET 
Find also the distribution of the kinetic energy of a gas molecule, 


3. If (X, Y) has the general bivariate normal distribution, show that 
ы тў 
(3-029 2, (X-mX: т) Qnm) Ja 09 


Or oy 


where 


has a x*-distribution with 2 degrees of freedom. 


196 SPECIAL DISTRIBUTIONS [9.4 


4. Show, using Theorem I Sec. 9.1, that for the x’-distribution with n degrees 
of freedom x,—2*(k-1)!n, and hence obtain the coefficient of skewness Yı 
and the coefficient of excess yz. 

5. If X and Y are independent variates, X being X?-distributed with m degrees 
of freedom and their sum Х+ Y x?-distributed with m+n degrees of freedom, 
then show that Y is x?-distributed with п degrees of freedom. 


6. For the t-distribution with n degrees of freedom, prove that the variance 
exists only for n > 2 and that its value is n/(n—2). 

7. Calculate the coefficient of excess y, for the (п) distribution, and show 
that it tends to zero as n tends to infinity. 


оо 
8. From i {(F)dF = 1 obtain the identity 


F"n-dF B(àm, àn) 


nF а пу" — nig 
0 
Use this identity to show that 


s, On nS Ю( £y 


T(am)t (an) m (k < ап) 


9. If thas а t-distribution with п degrees of freedom, then show that ғ" is 
an F(A, n) variate, 


10. If x,?, Xo? are independent x?-variates having m and n degrees of freedom 
respectively, find the distribution of »,?/X.?. 


CHAPTER 10: 
CONVERGENCE ‘IN PROBABILITY’ 


We start with the following fundamental inequality. 


10.1 TCHEBYCHEFF’S INEQUALITY 
If X is any random variable having a finite variance, then for 
апу ғ 0 
2 
P(\X-m| >) (10.1.1) 


where m and с respectively denote the mean and standard deviation 
of X. (We note that the existence of the variance implies that of 
the mean.) 


Proof. CONTINUOUS CASE. We have 
PUX-mizd= | лах 
t-m >e 
Now in the range of integration 1 < (x—m)?/e*, and hence 
кн. < 4 f (х – т)зу)ах 
к-т > е 


Since the integrand is nonnegative 


a 


RHS. <4 | (кту) 5 


Discrete CAsE. The proof is similar to that in the continuous 
case, 


Р(|Х-т| > ғ) = b Л 


z,-7mle 


< bos (х= m)*fi 


-m> 
1 e с? 
< a > Ga - mfi з 


198 CONVERGENCE ‘IN PROBABILITY’ [ 10.2 
The inequality (10.1.1) may also be written as 
2 
P(JX-m| <)> ins : (10.1.2) 
for any ғ > 0, or 
P(IX-m| > хо) < i (10.1.3) 
for any « > 0. 
Remark. Tchebycheff’s inequality brings out the significance of 
the variance as a measure of dispersion about the mean somewhat 
quantitatively. It states that the amount of probability mass outside 


the interval (m—«,m-+e) is less than or equal to c?/c? which is 
obviously small, for a given s, if the variance is small. 


Example. Let X be normal (т, с). Then by Tchebycheff's 
inequality (10.1.3) 


P(|JX-m| 220) <4 
But, since X* = (X – m)/c is normal (0, 1), we have 
P(|X-m| > 2в)=Р(| X*| > 2)=.0456 
the numerical value being obtained from Table I at the end of the 


book. This, however, shows that the Tchebycheff's inequality gives 
a rather poor bound for the probability in question. 


10.2 CONVERGENCE ‘IN PROBABILITY’ 


We shall now introduce a new concept of convergence, viz. convergence 
injprobability or stochastic convergence which is defined as follows. 

A sequence of random variables X,, Xas... Xn...is said to converge 
in probability to a constant a, if for any > 0 


lim P(| Х„-а|<)=1 (10.2.1) 
or its equivalent 
lim Р(| Хь-а|>)=0 (10.2.2) 
n>% 


and we write 
X, >a аѕп- ә 
inp 
That is, speaking practically, as п increases the probability mass of the 
distribution of X,, accumulates more and more about the point a. 


10.2] CONVERGENCE ‘IN PROBABILITY’ 199 


If there exists а random variable X such that X, — X — 0 as n> >, 
inp 


then we say that the given sequence of random variables converges in 
probability to the random variable X. 


Remark. If a sequence of constants a,— a as n > e, then 
regarding a constant as a random variable having a one-point distri- 
bution at that point, we may also write a, — a as n — =. 

in p 


Although the concept of convergence in probability is basically 
different from that of ordinary convergence of a sequence of numbers, 
the following simple rules hold for convergence in probability as well. 


Let X, a and Y, > basn > œ. Then as n — œ 


inp me 
(i) X. £ Y. > axb (10.2.3) 
inp 
Gi) Х.У, > ab (10.2.4) 
in p 
(ii) Xa/Yn > ajb, provided bz«0 (10.2.5) 
inp 


Proof. Let AB C denote the events | X, - al = łe, | Yn- 21 z łe 
and |(Xn+ Ул) -(a £5) > € respectively for any given ғ > 0. Then 
the complementary events A, B, C are respectively |Xn—al « te, 

| Ya - b] 3e and |(Xn + Ys) - (+ 0)1< e If A and B occur simul- 
taneously, then 
IQ + Ү„)-(а+®)|<|Х„-а| + |¥n-b] < det dere 
i.e. C occurs, so that AB implies Cor АВС C or C C A+B. Hence 
P(C) < P(A4 B) < Р(А)+ P(B) 50 ав n> = 


since X, > a, Y, > b as п > œ, so that P(C) 2 Qasn— =. This 
inp inp 


shows that X, + Ya > azbasn— =. 
inp 


By induction the above rule may be extended to'a finite number 


-of sequences of random variables. 


Next we prove that cX, — ca as n =, С being a constant, If 
inp 


£-0, this is obvious. If c#0, for any «> 0 


200 CONVERGENCE ‘IN PROBABILITY’ [10.2 


Р(\сХ„- ca| 7 :) - P(| X -al> e/lcl) ә 0 аѕп ә ә 
as Xn — а as n — œ, which proves the proposition. 
inp 


If Z, > 0as n — œ, then 2,2 — 0 as n — c, for 
inp m 


Р(2,% > ғ)=Р(|2,|2 /)90 азпә œ 


Then if, as n> œ, Xn > а, then X,,* — a’, for 
inp 


inp 


*=(X,—-a)* + 2a(X,-a)+a? >a? asn œ 
inp 


To prove (10.2.4) we note that 
Х.У = (Хь + Ү„)* —(Xn- Yn)°] 
— (а + D)? – (a – b)?] - ab as n — œ 
inp 


(10.2.5) will follow from (10.2.4) if we can show that xo l as 


b 
nin p 
n— = (030). To TT this we have 


2 ea у, 
m me -1¥n— bl) 
if|Y,-b|-|b|. Let A denote the event | ¥,-—b|<|b| so that A is 
the event | Y - 5| 7 |5!, and let, for a given ғ > 0, B denote the event 


РЕ Y > t. If AB occurs, i.e. A and B occur simultaneously, then 
ЈИ] 1, 
ТОВ = i Yn- 8D 2° 
or 
1Ys - bl = ТБ 


Representing the last event by C, it follows that АВ C C, and 
B=AB+ABCC+A 
So 
P(B) < P(C +A) < Р(С) + P(A) +0 asn e 
since vara аз п-> c, so that P(B) > 0 asn — c. This completes 
the proof, 


10.2] CONVERGENCE ‘IN PROBABILITY’ 201 


As an immediate consequence of Tchebycheff’s inequality, we have 
the following theorem regarding convergence in probability. 


Tehebycheff's theorem. Let Xi, Xas... Xn... be a sequence of 
random variables such that the mean т, and standard deviation On ОЁ 
Xn exist for all n. If o, —^0 as n — =, then 


Xn-m, 2 0 as п —> о 
inp 


Proof. By (10.1.1), for any «> 0, 
P(| Xs mz. 8) < os? [e — 0 asn ә 


ifo, 0asn— e. Hence X,-m,— 0 asn— c. 
inp 


COROLLARY. If, moreover, m,— m as n— =, then by the rule 
(10.2.3) together with the preceding remark X, —masn- e, 


inp 
Bernoulli's theorem. If X, is a binomial (n, p) variate, then 


Xn >p asn>o (10.2.6) 


N inp 
Proof. Since X, is binomial (n, p) 
E(X.)-np, о(Х„)= пра (q=1-p) 
Now for the sequence X,,/n, we have = 
E(X,/n)=p, o(Xn/n) = Jpqin>0 asn- œ 
Hence, by Tchebycheff's theorem, X,/n -p 22 as п œ whence the, 


theorem follows. 

If X, denotes the number of successes in а Bernoullian sequence 

of n trials with probability of success p, then X, is binomial i p) 

and f = ¥,/n is the frequency ratio of successes, in terms of which we 
get another version of the above theorem stated as follows. 

If f is the frequency ratio of successes in a Bernoullian sequence 

of'n trials with probability of success p, then 

— о 

lo р asn 

Consider now any random expe 

in the frequency interpretation of pro 

Under uniform conditions a large numbe 


(10.2.7) 


riment E in general. We stated 
bability that if E is repeated 
r of times, the frequency ratio 


202 CONVERGENCE ‘IN PROBABILITY’ [10.2 


of any event will be approximately equal to its probability. This 
practical sort of statement may be given a precise mathematical form 
by means of Bernoulli’s theorem. Let the random variable n(A) 
denote the frequency of any event A in a sequence ofn repetitions of 
E so that its frequency ratio f(A)=n(A)/n. If we now mathematically 
interpret a sequence of repetitions of E under uniform conditions as a 
sequence of independent trials of E, it follows that n(A) is binomialiy 
distributed with parameters (n, P(A)}, and Bernoulli's theorem states 


fA)—P(A) аѕп- ә (10.2.8) 


Remark. It will be interesting to compare Bernoulli’s theorem 
(10.2.8) with the frequency definition of probability (2.3.2). As the 
number of repetitions of E increases, the latter states that we are 
certain that the frequency ratio of an event gets closer and closer 
to its probability, whereas the former states that we become more 
and more sure that the frequency ratio will lie in a fixed small 
neighbourhood of the probability. 


Law of large numbers. Let X;; X5,.....X,...be a sequence of 
random variables such that Sn =X; +X 4 + Xn has a finite mean 
M; and standard deviation x, for all m. If Sn=0(n), i.e. 
X,/n— 0 as n — œ, then 

S,- М, 


Ма ыу asn— « (10.2.9) 
п inp 


Proof. We have 


д5 21-0. {асма 


Hence if 5„/п — 0 as no, (10.2.9) follows from Tchebycheff’s 
theorem. 


TE Mis Masis ies respectively denote the means of X,, X.,...... , then 


М„=т,+та+ е + ту, and writing 
X-(X, + X, P" + X,)/n=S,,/n 
т=(тү+т+ +m,)/n=M,/n 


(10.2.9) may be written in the alternative form : 


Х-т- 0 asn— e (10.2.10) 


inp 


10.3] 
EXERCISE 
: - 203 


CASE О 
ары ul бы р If the random variables 
sad aie dad ahd the same distribution with existent mean m 
Кок hel a d ion с, and Xi, Xo." Xn are mutually indepen- 
айн "s hen M,=nm and x,- J/ns-o(m). Hence the law 
mbers holds, and, moreover, the form (10.2.10) simplifies to 
A Kom asm» œ (10.2.11) 
Bolts of d hes note that the condition x,- о(п) is sufficient for 
HN анла = aw of large numbers but is not necessary. For 
айа Farsi components, the condition of finiteness of the 
method of or y c (which is, however, essential for the above 
law of lär proof) is also not necessary. It can be shown that the 
ge numbers for equal components holds under the simple 


conditio 
n that the mean m of the common distribution exists. 


10 
3 EXERCISES 
1. Pro : 
e ve the following generalisation of Tchebycheff’s inequality : 
then, for Recent: a finite second order moment and c is any fixed пип! 
E nas 5 E ЕХ: 
апу. > 0 X is a nonnegative random variable having mean m, prove tha 
á » P(X > rm) < Ит. 
* Sh 
Probabilit ai by Tchebycheff’s inequality, 
4 a at the number of heads lies between 
(богу = variable X has probability 
Tcheb - Compute Р(|Х-т|? 22), and compare і 
ycheff’s inequality. 
Е 5 
. If " 
X is a y(n) variate, then show that 
P(0«X-« 2n) < (п-п 
of pairwise 


d standard deeiations ci» fas 
ds if the sequence {on 


ber, 


t, for 


that in 2,000 throws with a coin the 
900 and 1,100 is at least 19/20. 

density function 12х1(1-х) 
t with the limit given by 


uncorrelated random 
с 


6. [ 
et Xis Yay Хе a sequence 
} is 


Variat 
es " i 
icto ae finite means my, My Mawnan 
Ounded, y. Show that the law of large numbers hol 
[ the law of large 


1. « 
Obtain Bernoulli's theorem as а particular case o 


numb 
ej 
T$ for equal components. 

f denotes the frequenc 


8. I Р 
п a Possion sequence of n trials, if 
ability O 


y ratio of 
f success in the ith trial 


Succes 

es and z= 1 à 

iat, : р = „Ур where p, is the prob 
m 


эп), then prove that f-p > 0 as n 9 
inp 


CHAPTER 11 
LIMIT THEOREMS 


11.1 NORMAL APPROXIMATION TO THE BINOMIAL 
DISTRIBUTION { 

We have already considered one approximation to the binomial 
(п, p) distribution, viz.'the Poisson approximation which holds if p 
is small and n large such that np is of moderate magnitude. But 
if p is not small but of moderate magnitude and п is large, then 
we shall show that the binomial distribution approximates to the 
continuous normal distribution with parameters (np, ~ трд). This is 


a very important result which is precisely Stated in the following 
theorem. 


DeMoivre-Laplace limit theorem. Let X, be a binomial (п, p) 
variate (0 < p < 1), the corresponding standardised variate being 


Hee (giie) 


Then for any fixed numbers а, (> a) 


lim P(a < Xa* < b)= 


n> J 2n 


(11.1.1) 


= 
PE os 

% 

] 

n 

» 

А. 

x 


Proof. The spectrum of X, consists of the points 0, 1, 2,- n, 
and 


Л-РО) (0) р — 6-01») (0 


Hence the spectrum points х;* of X* are given by 


i-np 


` X *- 
/npq 


(i=0, 1,- n) (ii) 


and 


P(X,*-xj)-P(X,- =f; (iii) 


111] NORMAL APPROXIMATION 205 


Then 
1 
/npq 


(iv) 


* 
AX;* xit х = 


We note that Ax,*—0 as n — 0, and this explains how the discrete 
spectrum of X,* tends to a continuous spectrum over the entire real 
axis asn > =, We have 
Р(а < Xn* <b)= > fi (у) 
а<=;%< 
Ву (ii) 
і=пр+ хг Jnpg, п-і=па- х пра 
For a < х;* <b, а, b being fixed numbers, i and n-i both о 
asn— ә. Then 
log f,-logn!-logi!-log(n-i)!-i logp + (n – i) log q 

and using Stirling's formula : 

log v ! «log /22+(v +4) log у-у + 0/12v (0<0<1) 


we get | 
ау | (оу n-i 
log /; = -log /2апра - (i +3) (pp) - t -£* 9 le8 ag 


1 6 Oo _ bs. i 
tated zh] (O <a» Box Op 0 


Now 
i qj. "um "ИР; 
teltat —=1-% E. 
np dion; np "4 * nq 
We know, from Taylor's theorem, that for |x| <} 


Jog (1+ х) =х- ix? Ax? (I2] 1) 


For sufficiently large п, x:* Jajnp and xi* Alplng can be made 
numerically less than 1, and hence я 


log (5) ов (iex 


п 
— 3/2 
а xg ES (2) 
-u L- ap ^9. \пр 


206 LIMIT THEOREMS [ 11.2. 


"and 


tos (на ) uten (1-е ч 
СИР D t зый (рү 
x* ng ж па t^s ng 
(1221, 1221< 


Inserting these in (yi), we can write 


r 


А x,*2 
log-f;= – log J/2anpq — 7$ + а 
N 


where r=r(n, x;*) is such that |r| < A, A being a constant independent. 
ofn and x,*. Hence 


i= 


J2anpa n 
where А = R(n, x;*) is such that |R| < В, a constant. 
Now by (iv) and (v) 
1 gë 12/2 
* РЕГ. и < R 
ра < X<) 25 4775 ae +1 PN 
acxt'cb а< х 


Since Ax;*=1/,/npq, the number of terms in the n is 
< (b-a) пра + 1, and hence 
I r|< Zio- a) /прд+1} 0 as n — œ 


a<xFed 


Therefore, the second sum > 0 as п — œ, so that 


Tin POX = b)= E lim > ena 
. 20 
i a< х < b 
1 b 
-222 
eee а 4 
"T 


This completes the proof of the theorem. 


11.1] NORMAL APPROXIMATION 207 


As introduced in (5.8.5), let ¢(x) and g(x) denote the standard 
normal density and distribution functions respectively, in terms of 
which (11.1.1) may be written as 


b 
lim P(a« X,* < 5-| éGOdx-4()-4(à (11.12) 


_ n> 


If F,(x) denotes the distribution function of X, and F,,*(x) that 
of X,*, then making a  – © and replacing b by x we get 


lim F,*(x)- Ф(х) (11.1.3) 


Also (11.1.3) clearly implies (11.1.2), i.e. (11.1.3) gives an equivalent 
form of the above theorem, which states that the distribution function 
of the standardised binomial (л, p) distribution tends to the standard 
normal distribution function as n tends to infinity, provided p is kept 


fixed. 
In terms of the variate X, (11.1.2) may be written in the form 


2 2 
lim P(np +a „/прӯ < Xn < пр+Ь ярд} d(xjdx (11.1.4) 
foo $ 
А working statement of (11.1.4) will be that if n is large and р is of 
moderate magnitude, we have the approximation formula 
b 
Р(пр+а „пра < X, x np b трд) «| d(x)dx (11.1.5) 
a 


Example. If a die is thrown 1,800 times, find the probability that the frequency 
of the event ‘multiple of three’ lies between 600 +50, 

The frequency of the given event is binomially distributed with parameters 
n=1,800 and p=4. Here p is not small and n is large so that we can use the 
normal approximation. We have пр= 600, /прд= 20, and, putting а= —2.5 and 
b 2.5 in (11.1.5), the required probability approximately equals 

E 95 

| o(x)dx = 0,988 
-2.5 

( The numerical value of the integral is obtained from Table Iat the end of 

the book. ) 


208 LIMIT THEOREMS [11.2 


For a graphical representation of this approximation, we note that 
the probability differential ¢(x)dx corresponds to f; so that (x) will 
correspond to f;/Ax;* =f; /npg. Fig. 23 shows the modified probabi- 

` lity diagram of the binomial distribution, in which the ordinates are 
fi < пра instead of f; for n= 16, p=4, together with the standard 
normal density curve. 


= -2 -1 021 2 3 
Fig. 23. Normal Approximation to the Binomial Distribution 
11.2 FUNDAMENTAL LIMIT THEOREMS 
For convenience of expression, we shall make use of the following 
terminology. i 


Asymptotically normal. Let Kas. Masui Xs Do. Sequence of 
random variables, and a,, 125...0... ANd By, Boy...By...two sequences of 


constants, If the distribution function of the random variable X» — an 


tends to g(x), the standard normal distribution function for all x, hen 
Xn is said to be asymptotically normal (ap, В,). It is, however, not 
implied that an, f, respectively denote the mean and standard deviation 
of X,; these may not even exist. The practical sense of this 
expression is obviously that the distribution of X, is approximately 
normal (an, Bn) for large values of n. 

Now AE Жу, 5, us Xn are mutually independent random variables 
each binomial (1, p), then by the reproductive property of the binomial 
distribution, their sum Sn=Xi+ Xat eee AX, is a binomial (n, P) 
variate, and DeMoivre-Laplace limit theorem states that Sn is asye E 
totically normal (ир, /npq). It is interesting to know that E 
theorem holds not only for a sequence of binomial variates, but, 


11.2] FUNDAMENTAL LIMIT THEOREMS 209 


general, for a large class of sequences of random variables obeying 
certain conditions. This is expressed by the following fundamental 
theorem known as the central limit theorem which we state without 
proof. 

Central limit theorem. Let X;, Xo...... Xn...be a sequence of 
random variables having finite variances such that X,, Жу,...... Xn is 
a set of mutually independent random variables for all n, so that their 
sum 

S, 7X; + Xa STEP TX 
has a finite mean M, and standard deviation 5,. Now a necessary 
and sufficient condition for S, to be asymptotically normal (M,, x.) 
is that, for any given t > 0, 


i и а 
lim = > | (х= my)? fi(x)dx =0 (11.2.1) 
k=l [x-m|>73_ 


for the continuous case, or 


n 


bx? - m fi e 0 (11.2.2) 
к=1 [rt 7 „|> TEn 
for the discrete case where, тт is the mean of Ху, /,(х) the density 
function of X; for the continuous case, and for the discrete case х; 
denotes the general point of the spectrum of X; having probability 
mass f;. The above condition is called Lindeberg's condition, 


lim 


"EC 
пэ Xn 


This condition is, however, not very restrictive and is satisfied by 
many a sequence of random variables. In particular, we shall now 
show that the condition is satisfied for the case of equal components. 


Casg or EQUAL COMPONENTS. If the random variables Xi, 
P CT all have the same distribution with mean m and standard 
deviation c, then Lindeberg’s condition is fulfilled, and hence S, is 
asymptotically normal (nm „/no), as in this case М =m, X — wine. 

Proof. Let us prove it for the continuous case, the proof for the 


disereté cage being similar. 


] EI > 
L.H.S. of (11.2.1)= E m (x-m)*f(x)dx 


|e-mi]> туо 


Where (x) is the common density function of the random variables, 


14 


210 NORMAL APPROXIMATION [11.2 


Since on | (x—m)®f(x)dx exists, the R.H.S. is clearly zero for 
any fixed т > 0. Hence the proof. 

S,- nm „ше 

he ola 
equal components may a be stated as: X is asymptotically normal 
(m, ol "| n). 

Remarks 

1. The central limit theorem holds for equal components if simply 
their common distribution has a finite variance. 

2. EXAMPLE OF FAILURE: CAUCHY DISTRIBUTION. If each 
random variable of the sequence X3, X.,....-- has a Cauchy distribution 
having parameters (A, џ),ћеп we know that X also has a Cauchy dis- 
tribution having the same parameters (cf. Ex. 28 Sec. 6.8) and thus is 
not asymptotically normal. The violation of the central limit theorem 
is, however, a consequence of the fact that the Cauchy distribution has 
no finite mean and variance. 


We note AUN and hence the central limit theorem for 


3. For the case of equal components, we can show that the 
central limit theorem implies the law of large numbers. If the 
sequence Ху, Xq,...... obeys the central limit theorem, the distribution 
function of /n(X-m)/o tends to (x) for all x. Let a(>0) be a 
given number. Then for any s > 0, 

P(|X-m| >. г) = Li AED] = ans 
i с 
< на for n > a®o2/e2 
с sl 
= 241 - Ha} as n —> = 
or Е 
0 < lim P(|X¥-m| >) < 2l1 - 2(a) 
n> 


Since this holds for all а, we have making a — о 
lim Р(|Х-т|;>)=0 


no 


which shows that Xx m asn—œ, i.e. the law of large numbers 


inp 
holds. 


11.2] FUNDAMENTAL LIMIT THEOREMS 211 


The converse of this is not true, for the law of large number holds 
even for sequences of random variables for which the variance of their 
common distribution does not exist, and as such the central limit 
theorem cannot hold. These remarks do not, however, apply for the 
general case of non-identically distributed random variables. 

We shall now state, without proof, another fundamental limit 
theorem concerning characteristic functions. 


Limit theorem for characteristic functions. Let Xj, Жу,...... Xn 
..be sequence of random variables having distribution functions F (x), 
F.,(x),...F,(x)...and characteristic functions x, (t); x«(t),......x«()...res- 
pectively. If as n — = (х) а distribution function F(x) which 
determines the characteristic function x(a), then x,(f) —x(r) as n — œ. 
Conversely, if x,(f) — a characteristic function X(f) as n — œ, then 
F,,(x) — F(x), the distribution function corresponding to x(t). 
With the help of this theorem, we can prove with surprising ease 
the theorems on Poisson and normal limits of the binomial distribution. 
PorssoN DISTRIBUTION AS A LIMIT OF THE BINOMIAL DISTRIBU- 
TION. The characteristic function x,(r) of the binomial (п, p) distri- 
bution is given by 
xy) = (pe't + 4)" ={1 + р(е'*- D)" 
Set p = un, и being a fixed positive number and make n — œ. Then 


x0 - [1 # qt - 1у}" > onn 


which is the characteristic function of the Poisson- distribution. 
Hence it follows from the above theorem that the binomial (m, u/n) 
(u > 0) distribution tends to the Poisson-y distribution as n > =. 


DzMorivnE-LAPLACE THEOREM. The characteristic function x,*(¢) 
of X,* = (X, — пр)/ J/npq is given by 
Xn" (£) = e- inst Nava (р eit/4npa + q)” 
= (p giat! npa 4- q e~ intl “пра)" 


We know, for any real x 


=] іх 


where 0 is a complex quantity such that |@|<1. Hence: 


212 NORMAL APPROXIMATION [11.8 


UNT ig _ q*t* qe 
giat! 5 npa = ] + BS 7Onpq^ Ox Sapa" 
апа 
> — ў, E 272 8748 
e-tvt тра = 1 -—PE__P +0 р os 
„пра 2npq ^ ° 6(npq)'* 
(101, 1621< 1) 
so that 
iat! Inga + ge - it Ls б <1) 
НЕ sm Te gts (npa)?! (101 
апа 
1? n 
Xn*() = (1-5 -— eui) >et o ano 


Since e^** ? is the characteristic function of the standard normal dis- 
tribution, DeMoivre-Laplace limit theorem follows. 


11.3 EXERCISES 


1, Find, using the normal approximation, the probability that the number of 
heads in 2,000 throws with a coin lies between $00 and 1,100, and compare it with 
the lower limit given by Tchebycheff’s inequality in Ex. 3 Sec. 10.3. 

2. Find the number of times a die has to be thrown such that the probability 
that the difference between the frequency ratio of sixes and 1/6 is in absolute value 
less than .01 is .99. 

3. Show that the central limit theorem holds under the following sufficient 


condition called Liapounoff's condition: The random variables X,, X;,......Xs--- 
have finite third order moments such that 


lim E = E(|Xi—m;|*)20 
n 


moo k=1 

4. Prove that if Lindeberg’s condition is fulfilled, Z, — œ as n — со, 

5. Show that the density function of a standardised 4(m) variate, n being a 
positive integer, tends to ф(х), the standard normal density function as n — co, and 
hence deduce that a л(п) variate is asymptotically normal. (Assume, if necessary, 
the validity of interchanging limit as n > cc and the sign of integration for proving 
the latter part.) 

в. By the method of characteristic functions, show that (а) a x?-variate with 
n degrees of freedom is asymptotically normal (п, 425) and (b) a Poisson variate 
with parameter п (a positive integer) is asymptotically normal (n, /n). 


» MATHEMATICAL 


STATISTICS 


CHAPTER 12 
RANDOM SAMPLES 


12.1 POPULATIONS AND SAMPLES 


Let E be a given random experiment and A any event connected with 
it. The basic problem of statistics may be regarded as the experi- 
mental determination of the probability of the event A. The method 
for this is readily obtained from the fundamental rule of inter- 
pretation of probabilities, viz. the frequency interpretation which 
states that ifwe repeat the experiment E under identical or uniform 
conditions a large number of times, the observed frequency ratio of 
A gives an approximate experimental value of the probability of A. 
Such a problem is, in general, called a problem of estimation. An 
estimate of the probability being thus obtained, we can employ it 
for predicting the rough values of future frequency ratios. Another 
type of problem may arise as follows. Suppose we know from a 
theoretical model or otherwise that the probability of A should 
have a given value, and we want to test experimentally if our 
conjecture is tenable or not. This problem which is closely allied 
to the problem of estimation is known as festing of hypothesis. 
Here also we are guided by the frequency interpretation; if we find 
that the frequency ratio of A for a long sequence of trials of E 
lies close to the suspected value of its probability, we may reason- 
ably believe that our hypothesis is correct, and if it is otherwise, 
we should reject the hypothesis. Now, as we can obviously see, 
exact estimation is not possible, for in order to compute the 
exact value of the probability wa must have to repeat the random 
experiment, so to say, an infinite number of times which is a 
practical impossibility. In practice, we have to depend on a finite 
number of repetitions of E which give only an approximate value of 
the probability. Naturaly then, when an estimate is found, we 
shall also be interested in knowing how good is the estimate. 
Likewise, in the latter problem of testing of hypothesis exact decision is 
impossible ; we can only decide with varying degrees of reasonableness 


216 RANDOM SAMPLES [12.1 


or confidence, and it is necessary to supplement any such decision by 
a quantitative statement of the degree of this confidence. 

The basic problems of Statistics as stated in the above forms are 
rather trivial. But these acquire considerable complexity and impor- 
tance when we go over to the probability distribution of a random 
variable X instead of simply the probability of an event. The proba- 
bility distribution will be usually unknown, and it will be our problem 
to determine it experimentally or to test a hypothesis regarding the 
distribution, The mode of experimentation will, however, be same 
aS before, viz, Tepeating the random experiment E under uniform 
conditions a large number of times and recording the result of each 
repetition. The frequency ratio of the event X= r, where & is any 
point of the spectum of X in the discrete case, or that of the event 
* €X < x dx in the continuous case will then respectively approxi- 
mate to P(X — &) — f, or the probability differential f(x) dx, and in this 
Way we can roughly determine the distribution. 
experiment is performed once, 
and corresponding to this eve 


conditions, This infinite sequence of repetitions of E will give rise 
to an infinite number of values of X, the totality of which will be 
called the population of the random variable X. A practical data 
is, however, obtained by repeating the experiment a finite number 
of times, say, n times which yields a sequence of n observed values : 


Xis Xo. Xn (12.1.1) 
of X, which is called a sample of size n drawn from the population of 
X. If now we perform another sequence of n trials of E, the 
sequence (12.1.1) will not be generally reproduced, but we shall geta 
different sequence of values of X : x,', x’, X^ Thus if different 
samples of size п are repeatedly drawn under uniform conditions, the 
sets of observed values of the random variable will fluctuate at 
random. In this sense, the sample is said to be a random sample. 


12.1) POPULATIONS AND SAMPLES 217 


The individual values x,, Xo,......%n Of the sample (12.1.1) are usually 
called the sample values. In statistical terminology, the distribution 
of the given random variable X is often referred to as the distribu- 
tion of the population, e. g.. we shall speak of the distribution function 
of the population or the characteristics of the population to mean 
the corresponding quantities belonging to the random variable. The 
sample is thus the basic experimental material as our disposal, and 
the methods of extracting information about the population (i.e. about 
X) from the sample will form the subject-matter of statistical analysis. 


Examples 

i. Let E denote the random experiment of throwing a die and 
the random variable X the number on the die. If we imagine that 
the die is repeatedly thrown an infinitely large number of times, we 
would get an infinite sequence of observed values of X which forms 
the population of the random variable. If now the die is actually 
thrown 100 times, a sequence of 100 values of X is obtained, which is 
then a sample of size 100 from the population. 

2. Let E consist in selecting at random a male person in the city 
of Calcutta is the age group of 20-25 years during a period of 
one week, and measuring his height which is represented by the 
random variable X. In order to repeat the experiment Æ under 
uniform conditions, it is obviously necessary to replace the selected 
individual in the given category before making the next selection, 
for otherwise the composition of the category (comparable to an urn 
containing individuals) and hence the conditions of the experiment E 
would alter visibly. The population of heights of persons belonging 
to the given category will be obtained by repeating the experiment 
E infinite number of times under uniform conditions. We remark 
that this ideal population of heights is conceptually different from the 
totality of heights of the different individuals in the category ; the 
ideal population is always conceived as infinite, whereas the number of 
individuals in the given category is finite. If, however, the number of 
individuals is very large and may be taken to be infinite for practical 
purposes, as is presumably the case here, it follows from our discussions 
in Sec. 4.3 that it does not matter much if the successive drawings 
are made without replacements which is very often done in practice. 


218 RANDOM. SAMPLES [12.2 


12.2 DISTRIBUTION OF THE SAMPLE 

To describe stochastically the distribution of the sample values, we 
define a fake random variable X which takes the sample values 
Хі Хавз" x4 each with probability 1/п. (All the sample values аге, 
however, not necessarily distinct; in case a particular sample 
value is repeated, say, 3 times in the sample, it gets a share of proba- 
bility mass 3/n.) This empirical probability distribution of X which 
is always of the discrete type, is called the distribution of the sample 
or the empirical distribution as distinct from the theoretical distri- 
bution of the population. We shall now show the important fact that 
if the size of the sample is large, the distribution of the sample 
approximates to the distribution of the population. 


Let F(x) be the distribution function of the population and F(x) 
that of the distribution of the sample. By definition 


F(x) - PQE < x) - vin 
where v is the number of sample values < x. Now y/n precisely 
denotes the frequency ratio of the event X — x, and hence for large л 
vin = P(X < x)= F(x) 
or if n is large 
F(x) = F(x) (12.2.1) 


This is usually expressed by saying that the distribution of the sample 
is the statistical image of the distribution of the population. 


12.3 TABLES AND GRAPHICAL REPRESENTATIONS 


Cumulative graph. The distribution curve of X: ye F (x), which 
is always a step curve, is called the cumulative graph or sometimes 
the sum polygon of the sample. For large samples, it follows from 
(12.2.1) that the cumulative graph will run close to the distribution 

, curve of the population : y= Р(х). 


Discrete population : Frequency diagram. When the sample is 
drawn from a seemingly discrete population, all the sample values 
Xis X2,......X, are usually not distinct. Then let the distinct sample 
values be denoted by £,, £,,...£,, and v; be the frequency of £; (i.e. Y; i$ 


12.8] TABLES AND GRAPHICAL REPRESENTATIONS 219 


the number of times the value £; is repeated in the sample) (j = 1, 2.... 
m), so that 
xvj-n (12.3.1) 


= 


The data (12.1.1) may now be conveniently presented in a tabular 
form in which é’s are entered in one column and the corresponding v’s 
in another, the sum of the v's being equal to the size of the sample. 


The spectrum of the empirical distribution then consists of the 
points £i, £s...... ¿m and since v; is the frequency of £j, the probability 
mass at 2; is vj/n. Hence the probability diagram of X is obtained 
by erecting an ordinate of height vj/n at cach point £j. This is called 
the frequency diagram ofthe sample. We note that £'s are also points 
of the spectrum of the population (which may not, however, include 
all of them) and the frequency ratio of the event X —£; is vj/n, and 


hence for large п 
уп = P(X= £) - f; (12.3.2) 


i.e. the ordinates of the frequency diagram are approximately equal to 
the corresponding ordinates of the probability diagram of the popula- 


tion for large samples. 

Example 1. The random variable represents the number of counts of «-rays 
emitted by a radioactive source as recorded by a Geiger-Mueller counter for a 
given time interval, and the results for 3,455 intervals are given by the following^ 


table : 
(ee LL ELE Ge i 
No.ofcounts Frequency No. of counts Frequency 
0 8 9 220 | 
1 59 10 121 
2 177 11 85 
3 311 12 24 
4 492 13 22 
5 528 14 6 
6 €01 15 3 
1 467 
$ 331 Total 3455 | 


220 RANDOM SAMPLES [12.8 


Continuous population : Grouping of data. Histogram. If the 
population is continuous, all 


distinct, and a £-v table of the 
such cases the data is usually г 
process known as Srouping into classes, 
in practical statistics which may be descr 
Note the smallest and largest of the sample values, and select a сол- 
venient interval (a, b) containing all the Sample values x,, X0,......Xne 
Divide (a, b) into m(<n) suitable Sub-intervals by the points tos 
ty... tm such that eio < ИА <tn=b, The sub-intervals 
(298 tj] (= 1, 2,...т) are called the class interyals or simply classes, 
and the points t; the class limits. The number of sample values 
belonging to each class interval (15-1, tj] is counted which will 
be called the Corresponding class frequency and denoted by y. А 
table is then drawn up showing the class frequencies y; against the 
class intervals (t;_,, 13]. A data presented in this form is called a 
grouped data. 


Remarks 


1. Ina grouped table, however, we lose sight of the individual 
sample values and thereby lose somewhat in exactness of knowledge, 
but this is usually overweighed by the gain in advantage in handling 
-the data. 


?. The class intervals are often taken to be of equal length, but 
sometimes, depending on the nature of the data, unequal intervals are 
also used. 

To obtain a graphical representation of a grouped data we procecd 
as follows. Let £; denote the middle point of the jth class interval 
which is called the class midpoint or class mark and Aé; the length 
of the jth class, i.e. 


cii) Д-у, (j-1, 2,...m) (12.3.3) 


On the jth class interval (251,17 draw a rectangle of height v;/nA£, 
for all /; the resulting diagram will be called the histogram of the 
Sample. The tops of the rectangles form à graph having equation 


y=) (12.3.4) 


12,3] TABLES AND GRAPHICAL REPRESENTATIONS 221 


where 
Ў) дЕ, u intha<x<ty (j=l, 2,...m) 


Now the area of the jth rectangle is v;/n which is also the frequency 
ratio of the event tj, < X <t; Hence if the sample is sufficiently 
large and the class intervals sufficiently small, then 
tj 
win = Ру, < Х<лд= | Sds = NEDA 


52a 


where f(x) denotes the density function of the population. 
Since again 


viin- KE) ^ 
Ke) =f) (j=1, 2,..m) 
or we may write 
Je) = Дх) (12.3.5) 


Thus the upper part of the histogram is the statistical image of the 
density curve of the population. 


Example 2. The following grouped data represent the rainfalls (in inches) 
from June to September in North-west India recorded for 83 years from 1875 to 


1957: 
Mr LAL eem 
Rainfall Frequency Rainfall Frequency 

1—9 1 23—25 13 
9—11 1 25—27 6 
11—13 2 27—29 6 
13—15 1 29—31 2 
15—17 5 31—33 0 
11—19 9 = 5 у 
35—37 1 

19—21 9 
21—23 21 Total 83 


The histogram of the data is shown below. 


222 RANDOM SAMPLES [12.4 


0:15 


0 7 Ib 15 19 23 27 31 35 
Fig. 24. Histogram for the Rainfall Data 


12.4 SAMPLE CHARACTERISTICS 

The characteristics such as the mean, variance, moments etc, of the 
empirical distribution of X are called the characteristics of the sample 
or the sample characteristics. These are obviously functions of the 
sample values x,, Xo,......X, and distinct from the characteristics of 
the population, i.e. of the random variable X. To differentiate these, 
we shall adopt the following useful system of notations. In general, a 
population characteristic will be denoted by a Greek letter, as already 
done in Ch. 7, and the corresponding sample characteristic by the 
Corresponding Roman letter, except for some special cases, e.g. the 
population mean which is denoted by m. Accordingly, the principal 
sample characteristics are defined and denoted as follows, 


5 Т 
sample теап = E(X) = ni» Xi, the arithmetic mean of the 


sample values which may be conveniently denoted by z, ie. 


PR 1 
[om Pa (12.4.1) 


The sample variance will be denoted by S? and d efined by 


ча усш. 1 2-3 (12.4.2) 


12.4] SAMPLE CHARACTERISTICS 223 


The sample kth moment, 
a - ER) = D t (12.4.3) 
We have 


do 71,a,7X 
The sample kth central moment, 
m;= E(X- т) = L = (x;- 7)" (12.4.4) 


So 
то= 1, Mm, =0, Ma =S? 


By (7.3.3) and (7.3.4) we have 
k 
m= COD eg (12.4.5) 
i=l 


or in details 
ma=S°=4;, -7° 
My —as - 3asX + 2x? (12.4.6) 
m4-4,-—4asX + 6a4x? – 3x* 

According to (7.10.2) the median 21а of the sample will be a 
point such that the number of sample values < zi/s is < 4л, and 
the number of sample values < 2112 isc ап. Thus ifn is odd, the 
median is always determinate. But if n is even, we have often a 
case in which every point of an interval between two sample values 
serves as a median ; „in such cases we follow our usual convention of 
taking the middle point of the interval as the proper sample median. 

Je is a point such that the 
and the number of sample 
Је тзг is similarly defined 


The lower quartile z,,4 of the samp 
number of sample values < 21/415 < in, 
values <Z1;4 is z im. The upper quarti 
and the semi-interquartile range = }(Zs/ 4 — 21/4) 

The mode of the sample is obviously the sample value having 
the maximum frequency, which will be denoted by #. 

The mean, median and the mode are the important measures of 
location of the distribution of the sample. The important measures 


224 RANDOM SAMPLES [12.5 


of dispersion are the standard deviation and the semi-interquartile 
range ; another useful empirical measure of dispersion is the range of | 
the sample which is defined to be the difference between the 

maximum and the minimum sample values. 


coefficient of skewness, gangs (12.4.7) 
another measure of skewness = 3 Ё (12.4.8) 
coefficient of kurtosis, b, M (12.4.9) 


m 


coefficient of excess, &а= 51 -3 (12.4.10) 


12.5 COMPUTATION OF SAMPLE CHARACTERISTICS 


Diserete population. When the population is discrete, we have 
have seen that the data is usually presented in a £-y table, and we have 


а= 1 Du (12.5.1) 
The given two-column table is now extended by adding columns 
for vé, vé? ee »&;", and the total in each column (except the first) 
is calculated. By (12.3.1) the total in the v-column must be z, the 
Size of the sample, and by dividing the other totals by n we obtain 
а, =F, day ар. A final column representing v(£ 4- 1)* if often added 
for checking the computation by means of the following formula : 
CHECK FORMULA 


Evif; t 1)* = хуу ( К E + ( 3 ) xvj? + 


(12.5.2) 
as being thus obtained, we can calculate the central moments 
туг by (12.4.5) or (12.4.6). From these the important characteristics 
like the mean, variance, coefficients of skewness and excess etc. may be 
easily computed. 

Linear transformation 


our computational labour by 
given by 


We may sometimes considerably reduce 
H H 3 / 
making a linear transformation ¿jéj 


&;=a6;'+b 


12.5] COMPUTATION OF SAMPLE CHARACTERISTICS 225 


which obviously amounts to the transformation X =аХ'+ b. If the 


characteristics of X, which are first calculated, are marked with 
a prime, then it follows the theory of probability that 


X-ax'4b, 5= |а|5', ть=аЁть' (12.5.3) 
and hence 
8i7 2. Es, Ese Es (12.5.4) 
1 lal 1, 2 


Example 1. Compute the mean, variance, mode, median, range and the 
semi-interquartile range of the sample given in Ex. 1 Sec. 12.3. 


The computation is shown in the next page. 


Set = £'4- 7. 

——————— ——À——w———et —— 
Е Y g vt vt E+) 
0 8 =й — 56 392 288 
1 59 —6 — 354 2124 1475 
2 177 T == 885 4425 2832 
3 311 =й —1244 4976 2799 
4 492 —3 —1476 4428 1968 
5 528 = —1056 2112 528 
6 601 E — 601 601 0 
7 467 0 0 0 467 
8 331 1 331 331 1324 
9 220 2 440 880 1980 
10 121 3 363 1089 1936 
11 85 4 340 1360 2125 
12 24 5 120 600 864 
13 22 6 132 792 1078 
14 6 7 42 294 384 
15 3 8 24 192 243 


226 RANDOM SAMPLES [ 12.5 
The computation is first checked by formula (12.5.2). 
хура + 25v} + 3v 2 20291 = 3v(£ +1)? 
Dividing the totals in the 4th and 5th columns by n= 3455, we get 
х= – 1.1230, а,'= 7.1190 
By (12.4.6) 


S" 25.8579 


The actual characteristics are now given by the transformation formulas 
(12.5.3). 


%’=5.8770, S: 5.8579 
or retaining only upto 2 places of decimals 
¥=5,88, $3—5.86 


The mode, median, quartiles and Tange are obtain directly from the original 
table, i.e. the first two columns ot the above table, the results being 


X26, 205=6, zi. m4, 255,57 


so that the semi-interquartile range-1.5. Tke maximum 


and minimum sample 
values are 15 and 0 respectively so that the range = 15. 
М 


Continuous population, For continuous populations, the data is 
usually grouped into classes, and as such the individual sample values 
Хі: Хы Xn аге not available for computation of the sample charac- 
teristics, We can, however, make approximate calculations by treat- 
ing the class midpoints as &5 and the corresponding class frequencies 
as y'S of the discrete case. Ifthe class intervals are sufficiently small, 
then this method will Yield fairly good approximations to the actual 
Characteristics, These approximations can also be sometimes slightly 


Improved upon by what are known as Sheppard's corrections which 
We shall not, however, consider, 


Example 2. Find th 


© mean, standard deviat 
excess for the sample giv: 


. ion, coefficients of skewress and 
en in Ex. 2 Sec, 12.3, 


12.5] COMPUTATION OF SAMPLE CHARACTERISTICS 221 


Е ‚ ғ 4 »£ oes ve 2841)! 
8 1 -7 —1 49 — —343 2401 1296 
10 1 -6 —6 36 —216 1296 625 
12 2 —5 —10 50 —250 1250 512 
14 7 —4 28 112 48 1792 567 
16 5 —3 i5 45. —135 405 80 
18 9 —2 -—18 36  —72 144 9 
20 9 -1 —9 9 —9 9 
2 2 0 0 0 0 21 
24 — 13 1 13 13 13 13 208 
26 6 2 12 24 48 96 486 
28 6 3 18 54 162 486 1536 
30 2 4 8 32 128 512 1250 
32 0 5 0 0 0 0 0 
34 0 6 0 0 0 0 0 
36 1 7 7 49 343 2401 4096 
Total 83 — 35 509 —779 10805 10686 


Ivt AXE бх 4 AZv£' + Xv = 10686 = Х,('+1)* (Checked) 
Now 
z= —0.421687, а,'=6.132530 
а,'= —9.385542, a,'2130.180723 
By (12.4.6) 
22.440227, m, = —1.777486, m,'2120.797735 
g,‘ = —0.122325, ғ#,'= 0.406729 


Hence by (12.5.3) 
2=21.156626, 5-= 4.880454 
8,= – 0.122325, 8, = 0.406729 


Finally, the results may be presented as 
$-2116, S=4.88, g,=-0.12, g,=0.41 


228 RANDOM SAMPLES [12.6 


12.6 EXERCISES 


1. The number of petals was counted for 22 flowers of a certain species with 
the following results : 


4 4, 7, 5, 4, 4, 4, 5, 6, 5, 6 
9, 4, 4, 4, 4, 5, 6, 4, 5, 4, 4 
Draw up a frequency table, and find the mean, median and mode of the sample. 


2, The weekly wages in rupees of 25 workers of a factory were recorded 
to be 


25.50, 2100, 42.75, 31.75, 16.00 
16.25, 20,25, 22,25, 2435, 30.50 
2225, 20.25, 24.00, 36:75, 40.00 
27.50, 23,00, 18.75, 20.25, 24.75 
18.50, 33.00, 34.75, 2025, 26,00 


Arrange the data into suitable classes, and compute the mean and y. 
sample. 


3. An experiment consists in throwing a die 5 times and noting the number 
of sixes. The experiment was repeated 200 times with the following results : 
No. of sixes 0 1 2 3 4 S 


ariance of the 


Frequency 58 86 40 14 2 0 
Find the sample mean and standard deviation, 


4, The number of telephone calls received daily in a certain house in Calcutta 


was recorded for 92 days from Ist May to 315: July 1962, and the following data 
Were obtained : 


No. of calls Frequency No. of calls Frequency | 
- | 
3 2 10 11 
4 5 11 7 | 
5 10 12 4 | 
6 8 13 4 | 
7 12 14 2 | 
8 12 | 
9 15 Total 92 | 


Compute the mean, variance, ness and excess, mode, median, 


ap coefficients of skew: 
Semi-interquartile range and га: 


nge of the sample. 


12.6] EXERCISES 229 


5, The results of 150 determinations of the specific gravity of Ethyl Alcohol 
are classified as follows : 


Specific gravity Frequency Specific gravity Frequency 
.765—.770 2 .795—.800 21 
.770—.775 7 .800—.805 12 
.775—.780 17 .805—.810 9 
+780 —.785 18 .810—.815 9 
.785—.790 24 .815—.820 5 

| .790—.795 26 | Total 150 


Find the mean, standard deviation and the coefficients of skewness and excess of 
the sample. 


CHAPTER 13 


SAMPLING DISTRIBUTIONS 
13.1 SAMPLING DISTRIBUTIONS OF ‘STATISTIC’S 


Consider a sample x,, х»,...... х Of size n drawn from the population 
ofa given random variable X. We have remarked that the sample 
is a random sample in the sense that if we repeatedly draw samples 
of size n from the population of X under uniform conditions, the sets 
of sample values would fluctuate at random. This randomness of the 
Sample may be mathematically described in the following way. The 
first sample value x, may be regarded as the observed value of a 
random variable X,, the second sample value x, that of another random 
variable X, etc., and finally the nth sample value the observed value 
of a random variable X,. But all the sample values x,, x5,....x, are, 
in fact, observed values of the parent random variable X, and as such 
the random variables X,, X.,...... Xn must all have the same distribu- 
tion, viz. that of X, ie. of the population. Moreover, since the 
sample values are given by repetitions ofthe random experiment FE 
under uniform conditions, it follows that the random variables Х,, 
P CNN Xn should be mutually independent. Note that these random 
variables are connected with the compound experiment of n indepen- 
dent repetitions of the given experiment Е. To sum up, the sample 
values, xa,...... Xn are respectively regarded as observed values of 
the random variables P M — X, which are mutually independent, 
each having the distribution of the population. Or if we treat the 
Sample values themselves as random variables, a random sample of 
size n from the population of х may be defined to be a set of n 
mutually independent and identically distributed random variables X3, 
Xs,...... Хи, each having the distribution of X. This idea of random 
sample is of fundamental importance in mathematical statistics. 

The n-dimensional random variable 


x=(X,, Xs. TEM Xn) (13.1.1) 
Tepresents a random point in an n-dimensional space А", which will 
be called the sample point and К" the sample space. A sample of size 
TON Cs Nay conic, Xn) may then be regarded as an observed value of the 


13.1] SAMPLING DISTRIBUTIONS OF ‘STATISTIC’S 231 


sample point x. The distribution function of the sample point, i.e. 
the joint distribution function of X,, X.;,...... X, is given by 


Е(х1, ха...) = РО) FG)... Fn) (13.1.2 


Where F(x) denotes the distribution function of the population. 


Any function of the sample values ху, Xe,......%n is, in general, 
called a statistic ; for example, the sample mean x, the sample variance 
5° etc. are all statistics. Let a=a(x,, xs,...x,) denote any statistic 
which may be regarded as an observed value of the corresponding 
random variable A-a(X;, X,,...X,). Now given the distribution 
function of the population F(x), the joint distribution of X4, Х.,...... 
AX, is obtained from (13.1.2) which, we know, can uniquely determine 
the distribution of the function A—a(X,, X;,...X,) of the random 
variables X,, Xo)...... Xn. This probability distribution of the random 
variable A will be called the sampling distribution of the statistic 
a — (X, Xs,..Xn). 


Notations. For avoiding the boredom of frequent restatements, 
we shall stick to the following permanent system of notations as far as 
practicable. 


(i) ху, х«,......Х„ Will denote a sample of size п from the popula- 
tion of X, which will be treated as observed values of the random 
variables X;, X;,...... Xn respectively. The running real variables 
corresponding to X,, Xs,...... X,, for writing the distribution function 
etc., will be denoted, in keeping with our usual practice in the theory 
of probability, again Бу ху, xs,......x, as we have already done іп 
(13.1.2). We hope that this will not give rise to any confusion ; it will 
always be clear from the context if x,, x,,......x, denote the sample 
values or the real variables. 


(ii) A statistic will usually (with some exceptions) be denoted by 
a small Roman letter, as we have proposed for the case of sample 
characteristics : the corresponding random variable will then be 
denoted by the corresponding bold capital latter, e.g. if a=a(x,, 
хау.) is any statistic, the random variable corresponding to it 
will be A=a(X,, Ха... Xn). 


232 SAMPLING DISTRIBUTIONS [ 13.2 


In the above notations, the random variables corresponding to the 
sample characteristics are given by 


X- Lx, 5*-1 D- 


Ed : E mo 313 
д 0 DX Meli DW- — 0313 


M, 
$* 
etc, Speaking about the sample characteristics, we may also conveniently 
say—sampling distributions of the mean, variance etc. to mean the 
probability distributions of X, 5° etc. respectively. 


a, - s G.- =3 


When a sample of size n is drawn from the population of Y, we 
get an observed value of the n-dimensional random variable x, 
the sample point, so that we can also conceive of the popula- 
tion of the sample point if we fix our basic random experiment as 
drawing a sample of size n from the population of X, which, in other 
words, is the compound experiment of n independent trials of the 
given experiment E. If a sequence of m samples of fixed size п are 
drawn under uniform conditions, we shall get a sample of size m from 
the population of x, Likewise, a sample of size n from the population 
of X yields an observed value of any statistic A=a(X,, X.,... Xn), and 
Consequently an infinite sequence of independent samples (i.e. samples 
drawn under uniform conditions) will give rise to the population of 
the statistic A, from which a sample ofany finite size may be drawn. 
We note that any particular observed value a= а(х, Xs,...x,) of A can 
be regarded as a sample of unit size from the population of A. 


13:2; ESTIMATES— CONSISTENT AND UNBIASED 
Leta be an unknown charac 


unknown parameters, e.g. the population is known to be normal (m, о) 
Where m, c are unknown parameters; a may also denote any such 
parameter.) In order to determine an experimental values of a on 
the basis of a sam 


ple, perhaps a natural Suggestion is to find a statistic 
Пах ж... 


7) whose computed value is approximately equal to о. 


13.2] ESTIMATES—CONSISTENT AND UNBIASED 233 


This is, however, a very unprecise statement and means little mathema- 
tically, for we know that the computed value of a statistic fluctuates 
at random from sample to sample, and if the computed value of a is 
close to a for one sample, it may deviate considerably from a for 
another sample. Thus it would be meaningless to infer anything 
from a particular observed value of the statistic. A proper judgment 
can, however, be obtained from the sampling distribution of the 
statistic. Ifthe probability mass in the sampling distribution of the 
statistic a is concentrated near the point a, then we may say that it is 
highly probable that an observed value ofa willlie in a given small 
neighbourhood ofa. In this sense the statistic а will be called an 
estimate of the population characteristic or parameter о. A measure 
of goodness or precision of the estimate is naturally given by a measure 
of concentration or inversely by a measure of dispersion of the sampl- 
ing distribution of the statistic a about the point о, and a useful 
measure of this dispersion is furnished by E{(A—)*} or its positive 
square root. The above definition of an estimate is very general and 
somewhat loose, and accordingly, we may have different estimates of 
the same population characteristic or parameter with varying degrees 
of precision. An estimate a, of a will be said to be better than 
another estimate а. if the sampling distribution of a, is more con- 
centrated about a than that of as, or if E((A; — 0)°} < Ef(Aa — о). 


There are several desirable types of estimates, of which we shall 
consider only two, viz. consistent and unbiased estimates. 


A statistic a=a(x,, Xs,...x,) is said to be a consistent estimate of 
a population characteristic or parameter a if 


А а as пә ә (13.2.1) 


in p 
i.e. the precision of the estimate increases with the size of the sample, 
and hence such an estimate is expected to give very accurate results 
for large samples. The property of consistency is obviously a desirable 
property for good estimates. 


A statistic а= а(х: Xo,...x,) will be called an unbiased estimate 
of a population characteristic or parameter a if 


E(A)=« (13.2.2) 


234 SAMPLING DISTRIBUTIONS [ 13.2 


In case an estimate a is such that E(A) о, then it is said to be biased. 
The quantity E(A) — a is called the bias of the estimate a ; the estimate 
is said to be positively or negatively biased according as the bias is 
positive or negative. In view of the importance of the mean as a 
parameter of location in a probability distribution, the property of 
unbiasedness immediately recommends itself for good estimates. For 
an unbiased estimate a, a convenient inverse measure of precision is 
provided by o(A). 


Remark. The concepts of consistency and unbiasedness are 
independent, i.e. one does not imply the other. A consistent estimate 
may be biased and an unbiased estimate inconsistent. The bias of a 
consistent estimate, however, decreases with increasing size of the 
sample, An estimate which is only unbiased is not necessarily good, 
for it refers nothing to the precision of the estimate. Estimates which 


are both consistent and unbiased can certainly be regarded as very 
good estimates, 


In the previous chapter, we showed that the empirical distribution 
of the sample is a statistical image of the population distribution, and 
we must have been wondering what would be the connections between 
the sample characteristics and those of the population, 


These can 
now be precisely stated as the fol lowing theorems, 


Theorem I. a; is a consistent and unbiased estimate of ар, 
provided the latter exists, 


Proof. By (13.1.3) A= D Xj" and for all ғ 


E(X") = E(X") = ак 
Since Xi, Xo, Via Xn 


all having the population distribution, Aa, UP Х," are also 


distribution with mean aj; 
Law of Large Numbers for 


Ак > ar as n о 
in p 


Le. a, is a Consistent estimate of ay. 


a 


13.2] ESTIMATES—CONSISTENT AND UNBIASED 23: 
Moreover 
Ea) = 1 >) EGG = ox 
which shows that the estimate is also unbiased. 


Theorem II. If и; exists, my is a consistent estimate of д. 


Proof. From (12.4.5) 


Mi = be n Ja-àc 
i=0 


If иу exists, all lower order moments also exists, and hence by 
Theorem I 


% 
Ak 
My (- »( i Jari m = uy as n— œ 


inp 
t= 


This proves the theorem. We remark that ms are not always 
unbiased estimates of j;'s as we shall presently see. 


COROLLARY. g, and g, are consistent estimates of y, and ya 
respectively. 


Thus, speaking broadly, for large samples the characteristics of 
the sample give approximate values of the corresponding character- 


istics of the population. 


We know that the sampling distributions of the characteristics 
are uniquely determined by the distribution function of the population 
F(x). But the actual determinations of the distribution functions 
of the sampling distributions in terms of F(x) often present great 
mathematical difficulties, and general analytical formulas for these are 
y unknown. In the following section, we shall study some 
sampling distributions, and work out exact results 
simple and important population, viz. the normal 


mostl 
properties of the 
for a particularly 
population in the next. 


236 SAMPLING DISTRIBUTIONS [ 13.3 


13.3 IMPORTANT SAMPLING DISTRIBUTIONS 


Sample mean 
{ м | 
47m b Xi 


Since AX. Marss Xn are mutually independent each having the 
distribution of the population, we have by (8.5.6) and (8.5.8) 


EX)-m, o(X)= 2n (13.3.1) 


assuming that the population mean and standard deviation exist. 
By Theorem I the sample mean ¥ is a consistent and unbiased estimate 
of the population mean т, an inverse measure of precision of the 
estimate being given by c(X) = о/ Jn which decreases as n increases. 
Also it follows from the Central Limit Theorem for equal 
components that if o exists, x is asymptotically normal (m, o/ In). 


Sample variance, Unbiased estimate of the population variance 


5%- + -p 


$*- E Б (А: m)? - (X - m)? 


We may write 


So 
Hs)? = 1 >, ЕХ, -тз}- BR ny 
=o2 9? [by (13.3.1) 
or 


E(s*) - "1. (13.3.2) 


We know from Theorem II that 5° 
о; but (13.3.2) shows that the estim 
E(S*)—6*, the bias in this case is пера 


is a consistent estimate of 
ate is not unbiased. Since 
tive. Now if we set 
г 08 ЯР 3 
deg i] (13.3.3) 
S* — o? as No, E(s*) = 52 
е А in р 
15 а consistent as well as an unbiased estimate of o? ; s? will 
be referr 


ed to as the unbiased estimate of the population variance. 


13.4] NORMAL POPULATION 237 


If c is known, we can calculate the value of c/ „/п which gives 
an inverse measure of precision of the estimate 7 of the population 
mean т. But if o is unknown, as is usually the case, this cannot 
be done. In that case an approximate value of the same may be 
obtained by replacing с by a good estimate say, S ors, і.е. S/ /n 
or s/./n may be taken to be an approximate inverse measure of 
precision of the estimate ¥ of mm. 


13.4 NORMAL POPULATION 
It has been found from experience that a strikingly large variety of 
populations met in practice have normal or approximately normal 
distributions, e.g. populations of heights, weights etc. of racially homo- 
geneous people, temperatures, rainfalls etc, for a season, experi- 
mently measured values of a physical quantity, marks obtained in 
an examination and so on. This may seem somewhat strange at 
first sight but can be largely accounted for by the unique position 
of the normal distribution offered by the Central Limit Theorem. 
In many cases (a significant example of which will be found in the 
theory of errors) the random variable in question may be conceived 
as the sum of a large number of independent random variables 
arising out of a large number of random causes, and hence is 
approximately normally distributed by virtue of the Central Limit 
Theorem. This is why the normal distribution assumes great 
importance in statistics. Luckily for us the calculations with the 
normal distribution are also comparatively easy, and we shall be 
able to find the exact forms of the sample distributions of the mean, 
variance etc. in the case of the normal population. 

Consider a normal (m, с) population. Then X;, Xs AX, is 
a set of n mutually independent variates, each normal (m, c), and 
the following theorems hold. 

Theorem I. The sample mean X is normal (m, c/ J/n). 

Proof. Observing that X is a linear combination of 
Ху, Хо Xm the theorem follows as a particular case of the 
reproductive property of the normal distribution (cf. Sec. 8.8). 


"e /п(Х-т). 
um: XM 1 
ConorLARY. The statistic U= Е is normal (0, 1). 


238 SAMPLING DISTRIBUTIONS [13.4 


Theorem 11. The statistic pa has a x*?-distribution with 


v-n-1 degrees of freedom, and the sample mean X and sample 
variance $ are independent variates. 


Proof. This theorem is a simple consequence of Theorem III 
Sec. 9.1. We have 


Si= 1 = (X: - т)? - (X - m)? 


x 7 - > (=>) 53 (ux =m" 


Now ^+ e ] Xe- Р Xa- т aren mutually independent standard 


or 


normal variates, and 


anm). a (&- m) al (&- 5 1 Aem) 


An c Ael о 


is a linear combination of them such that 


І yS Iw I 1 
(al * C) e Ga =! 
Hence, by Theorem III Sec. 9.1, x? is X?-distributed with у=л-1 


degrees of freedom and X? is independent of U= Jn(X - т)/в so that 
X and S? are independent, 


Distributions of S? and s*. With the help of the above theorem, 
we may now easily determine the density functions for S? and s?. The 
probability differential for x? 

е-3%Х? (%х?)”?-1 


dF = —x3-— ЯХ 


2r (у) [v=n-1 
1 ИИЙ ET 
“DGG = (2e) ^ Gomes дуз 
So 
1 (n-1)/2 
7G» oram bs)" — Gum ene 


(0— S? < =) (13.4.1) 


18,5 ] EXERCISES 239 


Similarly, noting that x? = vs?/5?, we have 
Sa = piss) (Rte estem) (342) 
5 T») 120° ds 


Theorem III. The statistic ¢= IX- = known as Student's 


ratio, is t-distributed with v2n- 1 des of freedom. 


Proof. Since X? = vs?/o?, we may write t= 4 У0/ JX? where U is 
normal (0, 1), x? is x°-distributed with v degrees of freedom, and U 
and X? are independent. Hence the above theorem follows from 
Theorem I Sec. 9.2. 


Remark. The statistics U, X? and ¢ introduced above will be 
useful in the theories of estimation and testing of hypotheses for the 
normal population. We note that the distribution of each of these 
statistics is independent of the population parameters m, o; but U 
depends on both т and о, X? only on o and £ only on m. 


13.5 EXERCISES 


1. Show tbat the coefficients of skewness and excess of the sampling distribu- 
tion of the mean are respectively y,/ „/л and +,/n, n being the size of the sample 
and y, and y, the corresponding coefficients of the population. 

2. Show that c(s?) which serves as an inverse measure of precision of s?, the 


unbiased estimate of the population variance c?, is given by 
c (s!)- 1f, А, -2-3 e] 


ni n 
3. Prove the formulas 
y- (0-1Xn-2), 


n’ 


E(M, 
Em, )= (Doo 303) in "m ә 


Hence deduce that 


пт, 


na 
(п-1)(-2) and (n—-1)(n-2)(n-3 


are consistent and unbiased estimates of the cumulants x, and к, respectively. 
Also show that the corrected estimates of y, and y, for small samples are respec- 


) [Q1 1)m, – 3(п - 1)5*] 


tively 


ауну i Dg; + 6] 


240 SAMPLING DISTRIBUTIONS [13.5 


4. Show that A; is asymptotically normal (az, „а: —ay?)[n) if aa exists. 
5. Using the statistic x*, prove that for a normal (т, s) population 
e(s?)- J2I(n-1) e? 
Verify the result from the general formula in Ex.2. Hence show that an estimate 
of о(в") is „/2](п—1)5°. 
6. Find the sampling distribution of the mean for the (a) binomial, 
(b) Poisson and (с) gamma populations. 


7. Show that the sample mean X and the sample variance S? are uncorrelated 
if u, =0. 


CHAPTER 14 
ESTIMATION OF PARAMETERS 


141 METHOD OF MAXIMUM LIKELIHOOD 


Let us suppose that the distribution function of the population, 
F(x) has a known functional form but contains a number of unknown 
parameters 0,, 0,,......6;, and our problem is to find estimates of these 
parameters on the basis of a sample: x3, X.,...... Xn drawn from the 
population. There are several methods by which such estimation can 
be done, of which the most important is the method of maximum 
likelihood. The importance of this method lies in the fact that in 
most cases it yields very good estimates. These estimates are found 
to be good by many yardsticks, but we do not here propose to enter 
into the mathematical discussions of the same. 


In this method our first task is to define what is called the 


likelihood function of the sample. This is done Separately for discrete 
and continuous populations as follows. 


DISCRETE CASE. Let Y denote t 
for convenience, let us write 


Р(Х= х) =, (01, 05,......01) (14.1.1) 
The event that the particular sample x;, xs,...... Xn has been drawn is 
(= ъ=, um Xn), and the probability of this event, which 
is clearly a function of the sample values ху, x,,...... X, and the 


parameters 0,, б»,...... 6x, is defined to be the likelihood function of 
the sample to be denoted by E, X, 3 бз, 02,....0%), і.е. 


he parent random variable, and, 


La, Xas... Xn; 04, 05,...0:) = Р(Х, =x, Хе — Xs. X4 = Xn) 


Now since Xi, X2,...... X» are mutually independent each having the 
distribution of Х, we have 


Оха 915 05,...05) 


mf 935 02...00) f. (95, 05,...05)...... zn (Oas Ө,...6) (14.1.2) 
Continuous Casz. In this case 
lar sample may be represented by (x 


16 


the event of drawing the particu- 
+ < Жу iXxXidx, х, <X,< 


242 ESTIMATION OF PARAMETERS [ 14.2 


Xo taky Xa < Xs < х, + dx,), the probability of which is obviously 
the probability differential of the sample point x, Лх, Х›...Х4)@Х;йдХ& 
dx, where f(x;, хе...) is the density function of x. The density 
function of x will be, in the continuous case, defined to be the likeli- 
hood function, i.e. 


Lx Xo, Xn $035 025-01) =/(хаз› Xas.. Xn 


Hence if f(x ; 01, 623.6) denotes the density function of the popula- 
tion, then 


Cors йш. end баз Dass 01) 
— (X, $045 09501) (Xa 2 015 Өзз....01)..... Ха; 015 025...01) 
(14.1.3) 
When the sample values are regarded as fixed, the likelihood 
function L becomes a function of the parameters 0,, 0,0 only, 
and the method of maximum likelihood consists in finding those values 
of the parameters as functions of x,, x,,........ x4 which would maximise 
the likelihood function. Thus if the function L has a unique maxi- 
mum for 


03 — 0, (X3, Xa, Xu), 02 = 02 (X15 Xass Xn)... Өк = 01 (35, Хе,....Хх„) 
then the statistics fies 9, кези m are called the maximum likelihood 
estimates of 04, б»,........ 8x respectively. 

Since Г, >> 0, maximising Lamounts to maximising logL, the 
equations for which are 


0082 _ 9 elogL -0, ĉlogL 


90; ^ 90s 36, 0 (14.1.4) 


These are called the likelihood equations, by solving which we can 
find the maximum likelihood estimates of ө}, 6,,...... 0;, provided they 
exist. 


14.2 APPLICATIONS TO DIFFERENT POPULATIONS 


1. Binomial (V, p) population. Of the two parameters N is 


usually known, and our problem remains to estimate the parameter 
р. Here 


Р) - (3) pa - ys 


14.2] APPLICATIONS TO DIFFERENT POPULATIONS 243 


By (14.1.2) 


L- (E) )-- É pocta (Lo pa Gazae) 


So 
logL (x, ^ xa + Хь) logp 
+ {nN — (ху + хах) log(1 — р) + terms independent of p 


The likelihood equation is DE =0 which gives 
NptXq te + хы» М (0 Хаха) yN 
p 1-р 


ог 
р=(ху+ ха + + х) ПА = 1А 


p-x|N 

We know that the sample mean X is a consistent and unbiased 
estimate of the population mean which, in this case, is Np, and 
hence it follows that p is a consistent and unbiased estimate of p. 

If, instead of a sample of size n, we consider a single observed 
value у of the parent variable X, which may be regarded as a 
sample of size 1, then we have 

p-viN 

The interpretation of the above result in terms of a Bernoullian 
sequence of trials appears very plausible, We know that the number 
of successes in a Bernoullian sequence of № trials with probability 
of succes p is binomial (N, р), and the above result states that the 
maximum likelihood estimate of the probability of success is equal 
to the observed value of the frequency ratio of the same. 

Remark. We have E(X/N)-p, and Bernoulli's theorem states 
that X/N > p as №. These show that, for large N, ? =v/N isa 

inp 


good estimate of the parameter p. 


244 ESTIMATION OF PARAMETERS 


92. Poisson- population 


mesa. 
fme КА 


So 
di y ye ER 
Lee хі! ха 1х! 
ог 
logL = – пи + п? logu + terms independent of и 
The equation oak = gives u=% or "m Fs 


[142 


Since и is the mean of the population, its maximum likelihood 


estimate nis both consistent and unbiased. 


3. Normal (m, c) population 


fx ; m, o)= l e-C-n)3i2e2 
2x с 


Ву (14.1.3) 


- Lyi,- m)? 


Le (ау! n е 
So 


logL- —1п log(2x) –п logs - RE > (x; т)? 


The likelihood equations are 


ôlogL _ 9082 
am 0 E =0 


The first equation gives x(x,-m)-0 or m=Z, and the second 


-2+ 26 -m)-0 
с? = ; > @-m)"= 1 PA 


or 


РА д : 0 
Hence m=7, sample mean, and 5° = 5%, the sample variance 


=S. We know that the estimate m is consistent an 


whereas the estimate c? is consistent but biased. 


d unbias e" 


14.3] INTERVAL ESTIMATION 245 


Remarks 

1. For a given set of parameters, we can construct different 
likelihood functions which may produce different estimates of the 
same parameters. Let us take the following example. 


For estimating the parameter с of the normal population, we 
may, instead of the parent population, consider the population 
of the statistic s? whose density function is given by (13.4.2). 
From the given sample Ху, Xey Xn the value of s? is calculated 
which forms a sample of size 1 from the population of s*, the 
likelihood function for which is given by 

v 


L(s? ; e) «fs? ; о) +з) @? 


v/2-Y =v 02 


So 


logL = – у logo — - + terms independent of о 


The likelihood equation a € =0 gives G2=s? which is a consistent 


and unbiased estimate of o°. 
2. Inall the above examples we, may easily verify that the estimates 
obtained actually correspond to a unique maximum of the likelihood 


function. 


14.3 INTERVAL ESTIMATION 


We have so far estimated a population parameter by means of a 
single statistic. Such estimation by a single statistic is called point 
estimation, and a point estimate when computed from an observed 
sample is supposed to give a value somewhat close to the true value 
of the estimated parameter. This, however, provides very insuffi- 
cient information about the true value of the parameter unless some 
measure of goodness or precision of the estimate is also given. We, 
of course, know that a useful inverse measure of precision or a 
measure of uncertainty of an estimate a of a parameter a is given 
by JER(A- 91 or the like, (In case the latter contains unknown 
population parameters, we may calculate an approximate value of the 
same by replacing the parameters by their estimates from the 
sample). Thus given such a measure of uncertainty, we can indeed 


246 ESTIMATION OF PARAMETERS [14.3 


compare the goodness of different estimates of the samelparameter, but 
it still remains largely unknown how to make exact use of this measure 
as an error or correction of the estimate. It was found that this 
problem can only be satisfactorily tackled by the method of interval 
estimation which makes use of a pair of statistics forming an interval. 
The idea of interval estimation may be precisely stated as follows. 


Let a be a population parameter and ғ (0<¢<1) a given 
number. If there exist two statistics 


а=а(ху, хә, хь) and b-b(x,, Xy su) 
such that 
P(A < a < B)=1-¢ (14.3.1) 


where А=а(Х‹, ХХ) and B-b(Xi, Xe, Xn) are the random 
variables corresponding to the statistics a and b respectively, then 
the interval (a, b) is called an interval estimate or a confidence 
interval for the parameter a with confidence coefficient 1-:; the 


statistics a, b are respectively called the /ower and upper confidence 
limits for a. 


The probability statement of (14.3.1) is somewhat queer, a like of 
which did not appear anywhere in the theory of probability, but 
nevertheless has a definite meaning. We note that here A and B are 
random variables but a is a fixed constant, so that (14.3.1) states that 
the probability that the random interval (A, B) covers the point a is 
1-е. A practical interpretation of this will be that ifa long sequence 
of random samples are drawn under uniform conditions and the 
statistics a, b computed each time, then the ratio of the number of times 
the interval (a, D) includes the true parameter value a to the total 
number of samples drawn is approximately equal to 1—;. The 
number г is usually chosen to be small, say, .05 or .01 or .001 etc., i.e. 
the confidence coefficient is .95 or .99 or .999 etc., the corresponding 
confidence interval being then called a 95% or 99% or 99.9% etc. con- 
fidence interval. For a9 % confidence interval, we may roughly say 
that in repeatedly asserting that the parameter lies in the confidence 


interval for a large number of samples, we are liable to a risk of error 
only in 5% of the cases. 


144] METHOD FOR FINDING CONFIDENCE INTERVALS 247 


It is also possible to find many confidence intervals for a parameter 
corresponding to a given sample and a given confidence coefficient. 
To compare the relative goodness of these, we can use the length of 
the interval, b—a as an inverse measure of precision of the interval 
estimate ; of two confidence intervals, the one having the smaller 
length is obviously preferable. 

For a given sample the length of a confidence interval depends on 
e; the dependence, in general, is such that as e decreases, ie. the 
confidence coefficient increases, the length of the interval also increases 
making the estimate worse and worse. Thus in order to have a very 
accurate estimate, we must agree to low value of the confidence 
coefficient, which is, however, not very useful from the practical point 
of view. Hence for practical problems we must strike an optimum 
between the level of confidence and the precision of the interval 
estimate, As remarked earlier 95%, 99% etc. confidence intervals 
are frequently used in practice. 


14.4 METHOD FOR FINDING CONFIDENCE INTERVALS 


We shall here outline a method for obtaining confidence intervals 
which, although not perfectly general, is quite useful in many import- 
ant cases. For convenience, only continuous populations will be 


considered in the sequel. 

Let 85; Aas... өх be the unknown population parameters, of which 
we want to estimate, say, 01. 

1. Choose, if possible, a statistic 

Zez(Xi Nay Xn 3 01) (14.4.1) 

whose sampling distribution is independent of all the parameters and 
which itself depends on 0, but independent of 05, 05;...... 0. ; these 
unwanted parameters 05, 05,...... өз, are often called nuisance parameters. 

9, Now choose two numbers ae fe( > а) depending on e such 
that 


f jodze- (14.4.2) 


248 ESTIMATION OF PARAMETERS [ 14.5 


where f,(z) is the density function of Z, which is independent of all 
unknown parameters. This can usually be done in infinitely many 
ways which would lead to infinitely many confidence intervals. 


3. Eq. (14.4.2) states that 
Plae<Z< ве)=1- ғ (14.4.3) 


Now if the statistic z is such a function of 6, that the inequalities 
ae< Z < В. can be inverted to the form 4 < 0, < B where A and B 
are random variables corresponding to the statistics a and b res- 
pectively which depend on ;, then (14.4.3) can be re-written as 


P(4«0,« В)=1-= 


This shows that (a, b) is a desired confidence interval for 9, having 
confidence coefficient 1— ғ. 


Remark. It is often difficult to find a suitable statistic z as des- 


cribed above, and this curtails the generality of the method to a great 
extent. 


14.5 APPLICATIONS TO NORMAL (т, с) POPULATION 
Confidence interval for т 
Casel, c known, We can here conveniently choose the statistic 
йе AIn(x — т) 
с 


whose sampling distribution is 


normal (0, 1) and which depends on Mp 
the parameter to be estimated, 


Take two points + ue symmetrically about the origin such that 
P(-u<U<u)=1 =; 
or 


P[-u < Xom) ue)=1— 


Which can be re-written as 


HF emer si 
Ый ee is 


14.5] APPLICATIONS TO NORMAL POPULATION 249 


Hence a confidence interval for m having confidence coefficient 1 – е is 
(®- Ole, ту ma) (14.5.1) 

where и, is given by 
P(-u «U-cu)e-l-e 


or, from the symmetry of the 
standard normal distribution, by 


Fig. 25 
P(U > и) =%ғ (14.5.2) 


Remark. Instead of choosing the symmetrical points + и, we 
may also choose any two points unsymmetrically and obtain the 
corresponding confidence interval by an exactly similar method. But 
it can be proved that among all these the symmetrical points lead 
to the shortest and hence the best interval. 


Example 1. The mean of a sample of size 50 from a normal population is 
found to be 15.68. If it is known that the standard deviation of the population is 
3.27, find 95% and 99% confidence intervals for the population mean. 


For e-.05, we is given by 
P(U > uc) =.025 


From Table I at the end of the book, we — 1.960. 

Now n=50, 5215.68, г= 3.27, so that the computed value of the confidence 
interval (14.5.1) becomes (14.77, 16.59). This is a 95% interval for the population 
mean. 

Similarly, a 99% confidence interval is (14.49, 16,87). 

The length of the 95% interval is 1.82, whereas that of the 99% interval is 2.38, 

Case П. e unknown. Here с is a nuisance parameter, and the 
statistic и is no longer applicable as it contains c. In this case our 
choice falls on Student’s ratio 

$i s/n - m) 
5 


the sampling distribution of which, we know, is f-distributed with 
ven - 1 degrees of freedom. 


250 ESTIMATION OF PARAMETERS [14.5 


Determine two numbers + /? by 


P(-h<t<)=1-< 
or 


P(-t.« xm) <t)=l-¢ 


or 
Р X-St ө < + She -1-s; 
Jn An 
which shows that 


Ste, Re ‚5.3 
(= Jn E+ A (14.5.3) 
is a required confidence interval for m. 
Here te is given by 
P(-t<t<t)=1l-< 
or 
P(t »1)21: (14.5.4) 
Remark. If the sample is 
large and c unknown, we may 
also replace с in (14.5.1) by its 
estimate s (or 5) to give an 
approximate confidence interval 


Sue _ , Ste 
е зэ 


This was, in fact, the usual Practice in older statistics when Student's 


ratio was unknown, Now it can be theoretically proved that z-dis- 
tributions with large degrees of freedom approximate to the standard 
normal distribution, and hence for large Samples ft, — и, so that the 


interval (14.5.5) is approximately the same as the exact result (14.5.3). 


' Example 2. Seven laboratory determinations of the value of g, the accelera- 
tion due to gravity at Calcutta gave а mean 977.51 cm/sec? anda standard deviation 


Now it is known that the Population of the measured values of any 
ity subject to experimental 


14.5] APPLICATIONS TO NORMAL POPULATION 251 


For £2.05 and »=6 degrees of freedom, by (14.5.4) and Table Ш, te =2.447. 
‘Since =977.51 and S=4.42, a 95% confidence interval for the population mean 
is (973.09, 981.93). 


Confidence interval for с 


The suitable statistic is 
a. to" 


в? 
whose sampling distribution has a x?-distribution with v =п— 1 degrees 
of freedom. 
Choose any positive number x*,,, and determine х. by 
Р(х? < Х < x*4)-1-2 
This equation gives x?., as a function of х?,,. Ог we have 


2 
р(х“, <", < xte) =1- 


or 


Hence - а> 
ЕА Sif, t (14.5.6) 


is a confidence interval for с having confidence coefficient 1— г. 

Now corresponding to different initial choices of x?., we shall get 
different confidence intervals, Of these the shortest interval is obtained 
by minimising the length of the interval (14.5.6), viz. 

1 1 
м оя ate) 
as a function of х?,;. А practi- 
cal determination of this short- 
est interval will be, however, 
very complicated, and is usually 
avoided in practice. It is 
instead customary to determine 
Xes X^. from the simple 
equations 
PO< <= хе) 736 


254 ESTIMATION OF PARAMETERS [14.7 


or 


(Х-и, MNT D JI Ue NECI 1—& 


n? 18 
which gives the approximate confidence interval (14.6.3). 

2. Confidence interval for the mean of any population for 
large samples. Let us here digress a little from parametric 
estimation to consider the problem of interval estimation of a 
population characteristic, say, the mean for large samples. We know 
that for large samples the sample mean ¥ is approximately normal 
(m, с] „/п), and replacing с by its estimate s (or S) Х can be taken to 


be aproximately normal (m, s/ Jn), or aRnUE- m) to be approximately 


normal(0,1) Hence 
P( -и‹< шыш) < uj =1-e 


where и, is given by (14.6.1). This leads to the approximate 
confidence interval 


(2-2 e+ z) (14.6.4) 
for the population mean m, 


This method is, of course, not restricted tothe mean only, but may 
be used for other population characteristics as well. 


14.7 EXERCISES 


1. Estimate the parameter a of a continuous population having the density 
function (1+a)xe (0<x<1) by the method of maximum likelihood. 

2. Prove that the maximum likelihood estimate of the parameter a of 2 
population having density function 2(a — x)[a? 


(0<х<а) for a sample of unit size 
is 2x, x being the sample value, and show that 


the estimate is biased, 
3. Arandom variable X can take all non-negative integral values, 
P(X-i)-p(1-p) (i20, 1, 2,...) 
where p (0<p<1) is a parameter, 
On the basis of a sample of size n fro 


and 


Find the maximum likelihood estimate of 7 
m the population of Y, 


4. Estimate the parameter и of the Pascal distribution (Ex. 8 Sec, 7.14) by the 


method of maximum likelihood, and show that the estimate is unbiased and 
consistent, 


14.7] EXERCISES 255 


5. Find the maximum likelihood estimate of c? for a normal (m, с) 
population if m is known, and show that the estimate is unbiased and consistent. 

6. Considering a sample of unit size from the population of the sample 
mean for a normal (m, с) population, find the maximum likelihood estimate of 
m, assuming s to be known. 

*. bebes tes 5,? denote a sample of size k from the population of the 
unbiased estimate of the population variance for a normal population. Write 
down the likelihood function for this sample, and show that the maximum 
likelihood estimate of c? is given by 


2 = (S, +S? +. S Ik 


8. A population is defined by the density function 


1-ае-ж/@ 


Кх; ‹)= Toe (0«x«o) 


1 being a known constant. Estimate the parameter а by the method of maximum 
likelihood, and show that the estimate is consistent and unbiased. 

9. Show that approximate confidence limits for large samples for tho 
parameter и of a Poisson population having confidence coefficient 1 —« are given by 
the roots of the quadratic equation in д: 

n(r-up-uen 
which, to the order of 1/ J, are approximately 
tue Fn 
where ue is given by (14.6.1) 
10. Show that 


7 lie 

TE) 
are approximate large-sample confidence limits for the parameter a of the 
population defined in Ex. 8, ие being given by (14.6.1). 

41. The population of scores of 10-year old children in a psychological 
performance ( Dearnborn Formboard ) test is known to have a standard deviation 
5.2. Ifa random sample of size 20 shows a mean of 16.9, find 95% confidence 
limits for the mean score of the population, assuming that the population is 
normal. 

12. The marks obtained by 17 candidates in an examination have a mean 57 
and variance 64. Find 99% confidence limits for the mean of the population of 
marks, assuming it to be normal. 


43. The heights in inches of 8 students of a college, chosen at random, were 
as follows: 62.2, 62.4, 63.1, 63.2, 65.5, 66.2, 66.3, 66.5. Compute 95% and 98% 


confidence intervals for the mean and standard deviation of the population of 


252 


ESTIMATION OF PARAMETERS [ 14.6 
or 

РОК > х%,)=1 —te 
and 


(14.5.7) 
P(X? > х?) = he 
which state that the areas of the tails of the x? 


-density curve on the 
left of x*., and the right of Ж 


are equal, each being equal to }г. 


Example 3. In Ex. 2 find 95% confi 
deviation which, according to the theor 
precision of the measuring process, 


dence interval for the population standard 
y of errors, gives an inverse measure of 
Here е= ,05, п=7, S=4,42; X'e, and X*,, are given by 
Р(х? > % a1) =:975, P(X? > Хас а) =.025 
Corresponding to »= 


6 degrees of freedom. From Table П, x*, , = 1.218, X? ep = 14.626, 


and a 95% confidence interval for c, the Population standard deviation is (3.06. 
10.60). 


14.6 APPROXIMATE CONFIDENCE INTERVALS 


i. Binomial (7, p) popuíation. Let v bean Observed value of the 
parent random variable X, ie. a sample of unit size from the 
corresponding population, and we propose to find an approximate 
confidence interval for p, assuming that п (known) is large. 


By DeMoivre-Laplace theorem, for large n the distribution of thc 
variate 


See 
Anp(1— p) 
is approximately normal (0, 1). Hence if the points + и, are deter- 
mined by 
Ue 
| dG)dxel-; 
= Ue 
or 


| годах, (14.6.1) 


14.6 ] APPROXIMATE CONFIDENCE INTERVALS 253 


where ф(х) is the standard normal density function, then 


Man 
/пр(1- p) 


че 
P|- u< < u) = | Ф(х)йх=1- 
die 
This can be re-set in the form 
P(A<p<B)=1-« 
where A, B are the roots of the quadratic equation in р: 
(X-np)* = иг?пр(1 - p) 
so that an approximate confidence interval for p is (a, P) where a, b 
are the roots of the equation 
(v-np)* =игпр(1- p) 
ог 
á b n(2v-+ ug?) + Ue /4пу(п — у) + nue 


2п(п + uc?) (14.6.2) 


Now if we calcutate only up to the order of 1/ Jn, 
„7 v(n - у) 
a, b = n swf UO CY 
ie. an approximate confidence interval for p is 
LAT v(n — у) Y v(n — v) 
(2 -u OS™. р + л 57 ) (14.6.3) 
Another method, The result (14.6.3) can be deduced more easily 
from a slightly different point of view. If nis large, we know that 
X is approximately normal (np, /пр(1- p). Now the standard 
deviation contains the parameter p which if replaced by its estimate 
pevin gives an approximate value of the standard deviation to be 
Jv(n—y)/n. Thus in this two-fold approximation, one can take X to 
be approximately normal (пр, „/у(и = у)/т) for large values of n. 
Hence the variate 
X-np 
„y(n — у)/п 


is approximately standard normal, and if и, is given by (14.6.1), we 


have 


m E ) = = 
P( ис < diane a æ l-e 


256 ESTIMATION OF PARAMETERS [147 


heights of the students of the college, assuming it to be normal, and find the 
Jength of the interval in each case. 


14. 171 out of 300 voters picked at random from a large electorate said that 
they were going to vote a particular candidate. Find 95% confidence interval for 


the proportion of voters of the electorate who would vote in favour of the 
candidate. 


15. In Ex. 3 Sec. 12.6 find 99% confidence limits for the probability of 
obtaining 'six' in a throw with the die. 


16. A sample of size 500 from a Poisson-a population has a mean of 4.78. 
Calculate 9595 confidence limits for и. 


17. In Ex. 4 Sec. 12.6 find a 9595 confidence interval for the mean number of 
daily telephone calls, (a) assuming that the corresponding population has à 


Poisson distribution and (b) without assuming anything regarding the population 
distribution. 


18. The weekly wages of 144 workers of a large factory were recorded, and 
the sample mean and standard deviation were found to be Rs. 23.52 and Rs. 6.71 
respectively, Find 95% confidence limits for the mean wage. 
that the population of wages is normal.) 


19. From two normal populations having parameters (m,, с) and (m,, с), two 
independent samples of sizes п, and п, are respectively drawn. 
limits for т, – т, having confidence coefficient 1-e. 
Sec. 16.7.) 


(Do not assume 


Find confldence 
(Use the theorem of 


20. Let v, and v, be independent samples of unit size from two binomial 
populations (7,,p,) and (m,,p.) respectively, where n,, п, are known and 


large. Prove that confdence limits for p, —p, having confidence coefficient 1—« 
are approximately 


iln, =v, |N, E ue Jv (n, =>) пуа (п —v.)In;? 
where ue is given by (14.6.1). 


CHAPTER 15 
BIVARIATE SAMPLES 


15.1 SAMPLE FROM A BIVARIATE POPULATION 


Let X and Y be a pair of random variables defined on the event 
space of a random experiment Е. Any performance of E will give an 
Observed value of the two-dimensional variable (X, Y) and 
corresponding to п independent repetitions of E we get n observed 


values 
(ху, ул)» Gras Уз)... Ons Ya) 


of (X, Y), which is a sample of size n from the bivariate population. 
The sample being random, the above sample values may be regarded 
as observed values of the n two-dimensional random variables 

(X5, Y1), (Xas Уз)... (Ха Yn) 
respectively which are mutually independent all having the same 


Fig. 28. Dot Diagrams for Different Values ofr 


17 


258 BIVARIATE SAMPLES [15.1 


distribution function, viz the distribution function of the population, 
FG, y). 

By plotting the points (x3, yi), (x, у.),...... (Xn, Yn) we obtain a 
diagram known as the dot diagram or scatter diagram of the sample. 
The scatter diagram usually looks like a cluster of dots on the 
xy-plane, particularly if the sample is from a continuous population 
and provides a simple but useful graphical representation of the data. 
We may also construct the three-dimensional analogues of the 
frequency diagram, histogram etc. but these are rather unwieldy. 


The empirical distribution of the sample is obtained by placing a 
probability mass 1/п at each observed point (xa y) (i=1, 2,...п). 


Let CX, y) denote the hypothetical random variable associated with 


the empirical distribution. The characteristics of (X, y) are then the 
sample characteristics by definition, the most important of which are 
the following. 


means: F=E(X)=1 D xa y-EG)-l 5 (51) 


variances: 5,2 = EÍ(Y Ж) =! п | бу-з)в 
E-D L5 0-3) (15.1.2) 


moments: a,,=E(X"y!)= E = xj) у (15.1.3) 
So 
Ako 70,1, A= ад; 4:97], а=, a= 
central moments : 


ma-E(X-3)(-yy-L > (u-3y0.-3) — (15.14) 


то = Тал, Mo = тд; Mio= my m0 ; m 5,2, то = Sy? 
a 1 S 
covariance: my, =- > (x; - x) - y) (15.1.5) 


correlation coefficient : rage (15.1.6) 


Also we have the formulas : 


S, = az" -3*, Sy? -a,,—3*, m, -a4,- y (15.1.7) 


15.2 ] PRACTICAL COMPUTATION 259 


15.2 PRACTICAL COMPUTATION 


Discrete population. When the population is discrete, the sets of 
values x,, x,,...x, and y, ya... Yn generally contain many repetitions. 
Let the distinct x- and y-values be £j; (21, 2,...7) and 7. (k21, 
2,5), and let Уз, denote the frequency of occurrence of (£5, 0x) in the 
Sample. (If, however, a particular combination of é; with 7; does not 
occur, the corresponding frequency is taken to be zero.) The data can 
now be arranged in a two-way frequency table showing vj, columnwise 
against 4; and rowwise against 7;. Such a table is sometimes called 
a correlation table in view of the fact that we are mostly interested 
in computing the sample correlation coefficient r from this table. 


If Уш; denotes the frequency of £j in the set x,, x.,...... Xn and Vyk 
that of n; in the set y,, ys,...... Yrs then 


Veg = Хур, Уук= Ху; 
k 3 


so that 
Хуа = Хуу = SEVEn 
Hence 
zl Ww" p ged D % (15.2.1) 
п п 
Gm D Vah anst уут? (15.2.2) 
п п 2 
апа 
а, = 1S У $j nk 
n 
Set 
Py EV 5x & qi =: uk (15.2.3) 
so that 
EP = УУУ EF = Evz; Éj 
295 = ХХУз Nie = Ууу 9; 
and 


EPr k= 295 j= УХУ &; ть 


260 BIVARIATE SAMPLES [152 


Then 


1 f 1 
а= LE > У (15.2.4) 


To the given table we add columns for vy, уут, Vyn?, p, pn and 
rows for v, Već, v,£^, q, дё and obtain the respective totals. The 
following identities may be used as checks on the computation. 

CHECK FORMULAS 
G) Жуш;= DV: (1) Spe = Уш; & 
(15.2.5) 
(i) 54;= Хуу: me (iY) хр, ть= ду Éj 

Ж, F, аза, ауз and ay, are easily calculated from the table ; Sz, Sy 
and m,, are then given by (15.1.7) 

If convenient, we can make linear transformations 

xi=ax; +b, yi cy; +d 

ie. (15.2.6) 
£j a£j +b, ny Сї; +d 

for which the transformation formulas are 
E-axab, у=су+а 


(15.2.7) 
$,2|a|$;, ^ S,-|elS/,  r=r' (ifac>0) 


Continuous population. A sample from a continuous population 
usually consists of more or less distinct values and if the size of the 
sample is not very large, we can draw up a two-column xy-table. For 
calculating the above characteristics columns for x?, y? and xy only 
have be added. We may also make linear transformations if found 
suitable, 

If, however, the sample is large, then grouping is necessary. Both 
the x- and y-values are grouped into classes and the results exhibited 
ina grouped correlation table. The sample characteristics can then 
be approximately calculated by treating the class mid-points for the 
x-and y-values as £'s апа respectively of the discrete case. 


Example. The following correlation table shows the heights (in.) and weights 


(b) of 114 adult males. Calculate the sample correlation coefficient between 
height and weight. 


15.2] PRACTICAL COMPUTATION 261 


EERE 


Height (in.) 
я ERY. Eis 1i- 62l- EN E. = = 1992 
Ve Xd uod d FFT ome 
| 804- 90 1 Б 
904-100 — 1 y" z P 
1004-110 b. € # LX 'l £ 
| 1104-120 yr 4 *.9 bee 22 
| 1204-130 i. Bh f Ж К 
1304-140 yo oW ш n 
1403-150 2 з 5 а 9.9 
1504-160 1, -2 91 l ? 
1604-170 ig à x3 6 
1704-180 1 S 
1803-190 1 1d = 
1903-200 1 n 


Total 1 1 8 16 34 27 16 6 5 114 
The tabular computation is shown in the next page. 
By (15.2.1), (15.2.2) and (15.2.4) 
¥’ = 0.508772, ў'= – 0.535088 
az,’ = 2.614035, ayx’ = 4.464912 
a, 71.350877 
By (15.1.7) 
5,2-2.355186 ог 5./= 1.534661 
5/2= 4.178593 ог $5 = 2.044161 
туз’ = 1.623115 
which give 
r’=0.517394 
Hence the correlation coefficient together with other characteristics of the 
sample are given according to (15.2.7) by 


3 — 66.080044, = 129,89912 
S, = 3.069322, 5,=20.44161 
r=0.517394 


| ог 
| т—668, ў=12990, 5,=307, 5у=2044, r=0.517 


BIVARIATE SAMPLES [ 15.2 


262 


pst} v2 | st | st | е- |] 0 62 | 9p | 9 | 9t 
19-19 | 9 6 $= | tz-| ze] Seq) 22] у= 
| og | | 9 | ze} о | ot | elo | я 
Bees | gs | oz | st | ze | zz | 0 9gr-| 91-| £— | #- 
st | 85 608 | 19- | мие | 9 || | | о |а jt I 
0 0 9g 9 I I 
SI Ё os 01 2 I 1 
8 @ 9I t I I 
o£ £I ps | sI 9 4 XT ® 1 L I | 
9T 8 oe | OI | LE 1 2 І 
ez ez л a | a] 2@ [ё £ S £ 2 | 
0 л 0 0 | 6l Role 1? | $us | 
W- | v #2 tz- be I £ 6 А е | 1 | 
8 r- 88 |- | € | I t |^ pst Ws I 
6 e- 66 | æ- | uU T I € | Ж ME 
Or |.01- 08 d= | $ | © g 1 
о | 2- cc | i = || Л "АИ ae ee ЖЕКИ a 
aed | ur | we [Te | «| tl ol le [е [ъ= 2 
sieg | Y V. | Y69 | 4140 | 3!co | Steg | S19 | зс | М/с] 3 


Esel + “OL =& “©су зу = jas 


15.8] LEAST SQUARE CURVE FITTING 263 


15.3 LEAST SQUARE CURVE FITTING 


For fitting a curve of the type 

yeg(x ico Саз.) 
WHETE! Cos Cyssesees are unknown parameters, to the empirical sample 
distribution, ie. to the observed set of points (xs; Yı), (Gea, а) 
(х,, у„) by the principle of least squares, we have to minimise 


E D muc d s 
BUY - QE; co 6o IIo д 2, Di 804 Cor te E 


or to minimise Xly; - 20: 3 Co» C,...)}? as a function of Co, C15... This 
8 the empirical form of the principle of least squares which consists 
in making the sum of the squares of the deviations of the observed 
points from the curve measured in the direction of the y-axis a 
a minimum. Д 
Regression lines. It follows from the general theory (Sec. 8.13) 
that the regression lines of the sample are 
y-3-by(x- 9) (15.3.1) 
and 
x-Z-ba(y-5) 


the former being the regression line of Y on X and the latter that of 


Хоп Y, where 


S 


bye Sh bur Ss (15.3.2) 


are the regression coefficients of the sample. 
It also follows that |r| gives а measure of goodness of fit of the 
regression lines to the observed points. 
Example 1. Find the regression lines of the sample in the example of 
Sec. 15.2 
Here 
b, 23.4583,  ba= 0.07768702 
and hence the regression lines are 
y -129,90 =3.45(x - 66.80) 
x= 66,08—0.0777(v – 129.90) 


264 BIVARIATE SAMPLES [ 15.3 


Parabolic curve fitting. The normal equations for fitting a kth 
degree parabola 


YHCotCiN+CgX® + +556: + сухе (15.3.3) 
to the observed points аге, by (8.14.4) 
Co" doo + C1 *d1o Ca * ago + Ck ko = doi 


Co*0, o C, "5g + Ca" go tre CI esso = 013 
(15.3.4) 
Соро + C1 44,0 + Се*ак+е»о + 5" Н Ce ззо = къ 
Or multiplying all the equations by п, we obtain 
ПСО + c,* xi cg * xi? orc xxà- SY; 


Co SXi t c, УЦ + CoP ONES eee rou xxl = Diy; 
(15.3.5) 


Co * x 0, * Xx + eS xx EP ee + e xx = Exh, 
These equations determine the least square estimates co*, c, *,...... ср 
of the parameters, 

The residual of Y, Vy is given by 

Vy = Y- co - e, * X- vee ept XE (15.3.6) 
and the residuals of the sample, i.e. the values v,; which Vy takes are 
given by 

Yu = yi- €o* Саха ext x (iel, 2,...п) (15.3.7) 

1. The first equation of (15.3.5) states that x»,,-0, ie. the 
sum of the residuals is zero. 

2. The normal equations may also be written as 

X9y-0, X хуц 0,...... E Xy = 0 (15.3.8) 

Assuming, for simplicity of discussions, that all the x;'s are 


distinct, the normal equations will yield determinate solutions only if 


k-n-1l. For ifk >n-l1 or n< 6+1, апу n equations of the set 
(15.3.8) reduce to Vyg=O ог 


У €o* + eT E ee + eux (i= hs: De aires n) (15.3.9) 


15.8] LEAST SQUARE CURVE FITTING 265 


so that the rest ofthe (k-7+ 1) equations are identically satisfied, 
which shows that only n of the (k+ 1) normal equations are indepen- 
dent, and if k > n-1 or n «k«l, the л equations (15.3.9) in the 
in the (k+1) unknowns Со, €; 5, ...... c,* are necessarily indeter- 
minate, Geometrically, the equations (15.3.9) mean that the best- 
fitting kth degree parabola would pass through all the 7 observed 
points, and if k>n-1, an infinite number of such parabolas are 
Obviously possible. If, however, ken-1, the solutions of (15.3.9) 
are unique and the best-fitting parabola exactly passes through all the 
Observed points, і.е. we may say that the case of least square fitting 
reduces to that of interpolation. 

A suitable measure of goodness of fit of the best-fitting parabola 


to the observed points is provided by 
Ry= (Uy, Y) 


where (15.3.10) 
Uy со* + c,*X+ e.n + c, X 
From (8.14.12) and (8.14.13), 0 < R, < 1 and 
К, =1 а; (15.3.11) 


where, by (8.14.10) 
Evy? = DHF — Co* E Yi- c, XXy T rm EMV (15.3.12) 


Let us suppose, for convenience, that 


Practical computation. 
For the normal 


the data is presented іп а two-column xy-table. 
equations (15.3.5), we shall have to prepare columns for х°, Х°,...... 
хаё; xy, x2y,......x*y and a further column for y? for computing Ry. 

Linear transformations x<ax'+b, y=cy'+d are sometimes 
helpful. In that case we first find the best-fiting parabola to the 
points (х1, у) @=1, 2,...п), which when transformed back in terms 
of x, y gives the best-fitting parabola to the original data. We note 
that this depends upon the fact that by the above transformations a 
general parabola in the (x, y)-plane is transformed again into a general 
la in the (х, y’)-plane and vice versa. It can be easily seen 


parabo 
ess of fit R, remains invariant under the 


that the measure of goodn 
above transformations. 


266 BIVARIATE SAMPLES [ 15.3 


Example 2. The percentages of protein content (x) and vitreous kernel (у) in 
8 samples of wheat were found to be as follows : 


*| 24 | 36 75 


sa | or | 9% 


в | 


» | 101 


13.3 bd Е | 
ВЕ | 


| 
10.2 | 10.8 | 11.1 | 12.2 


| 


Fit a parabola of the form у= c, c,x-4- c,x? to the above data. 


Set y 2 y'- 10. 
PE ip s - T | xy му у” 
e | em ! LN c ЕЦИНЕР 
24 | 10.1 |01 | 576 | 13824 331776 | 24 57.6 | 0.01 


36 | 10.2 | 0.2 | 1206 | 46656 1679616 | 72 2592 | 0.04 
45 | 108 | 0.8 | 2025 | 91125 4100625 36.0 1620.0 | 0.64 
55 | 11.1 | 1.1 | 3025 | 166375 9150625 60.5 33275 | 121 
75 | 12.2 | 22 | 5625 | 421875 31640625 | 165.0, 123750, 4.84 
84 | 13.3 | 3.3 | 7056 | 592704 49787136 | 277.2 | 232848 | 10.89 
91 | 140 | 4.0 | 8281 | 753571 68574961 364.4 | 331240 | 16.00 
96 | 15.8 | 58 | 9216 | 884736 84934656 | 556.8 | 534528 | 33.64 


506 | — 17.557100 2970866 250200020 | 1469.1 | 127500.9 | 67.27 


The normal equations are 
8c,/*-- 506c,*+-37109c,* = 17.5 
506c, * + 37100c,"* 4-2970866c,'* = 1469.1 
37100c,/* + 2970866с,'* + 250200020с,'*= 127500.9 
Solving these we get 
с„'* = 1.371707, с,'* = —0.07382217, c,'* =0.001182759 
Hence the equation of the parabola is 
y'2 1.372 - 0.07382x +0.001183x? 
or " 
у= 11.372 - 0.07382x 4-0.001183x* 
which is the required best-fitting parabola, 
Now by (15.3.12) Zv,,? 20.9144 and 
3 =2.187500, a,,’=8.408750, S, =3.623594 
Hence by (15.3.11) 
у К, =0.984 ог R,-0.984 
This shows that the fit of the above parabola to the observed points is quite good. 


15.4] MAXIMUM LIKELIHOOD ESTIMATION 267 


The correlation coefficient of the sample is found to be 0.942 which shows 
that the regression lines also fit well to the data, but the parabolic fit is much 


better. 


15.4 MAXIMUM LIKELIHOOD ESTIMATION 
The logic of the process is exactly the same as in the univariate case. 
Here the likelihood function of the sample L-L(*...Xn; Yin; 
Caisse)» nosse ө being the population parameters, is defined as 
follows. 

DiscRETE CASE. L fa, y (017...01)- ee Jaw un (35- 
Fes (15...01) = Р(Х = Xi Y= yj. 

Continuous CASE. L=f(*i, Yi; 015.01)... ns Ya 015...01) 
where f(x, y ; 03,...01) is the density function of (X, Y). 

For fixed sample values we maximise L or log L, the equi 
which are 


.01) Where 


ations for 
дов _ 
001. у 


A set of solutions of these likelihood equations : 

B, = 6; (%1960-%ns yis. 2s m 8 = 0x35. Xn: Jaren) а 
which corresponds to a unique maximum of L then gives the required 
maximum likelihood estimates of the parameters. 

Bivariate normal population. The density function of the popula- 


tion is given by (6.4.3), and hence 


Le (2a) "оз "ay "(= РР)" 
pape | [ponas a =m) oem 
xe a(l-p?) [x 9.0, 9, 


(15.4.1) 
or 
log L= -n log (22) - п log oz- n log oy - 4” log (1 - p°) 
B (x; — т.) Qu = т.)о: my) Онт)? 
х9 24 тшш ы ТЫШ MIS } 


с20у су 


The likelihood equations аге 


4 dlogL _ „у OlogL | -n OlogL | 
(0 om, — 0, (i) omy 0, (iii) дог a 
‚ү OlogL _ ôlogL _ 
(iv) до, =0, (») др 0 


268 BIVARIATE SAMPLES [15.4 


(i) and (ii) give 


E&i- ту) + prvi = My) _ 0 
Oz Gy 


EX(xu-mj _ >(у— ту) ex 


Oz бу 
Since p? 41, we must have 
X6u-m-0, x(-m,)-0 
or 
m.-EX, myey (vi) 
(iii) and (iv) respectively reduce to 


път D |- Ea + p Gu mai т _ 


Ox [o 
and 
1 (х- т„)(%-т„) _ (Ve= mM)? = 
a ов fo NA oy? } 0 
or using (vi) we get 
S32 5,9, 
к ай x ay = 
1-р g + pr бубу 0 
and 
Sy? SeS. 
1-p?- = m 5,7 50 
So 
5.2 y NE =k (say) (vii) 
Ge" бу басу 
and 
1-p?=k(1—-pr) (viii) 
(v) gives 
p б-т? _ з, azm =m) бит 
us p eee 


E id (х= т.)0 - my) = 0 


15.5] EXERCISES 269 


Using (vi) this reduces to 
525, Sy 25) +(1—-,* ву, Баби 0 


бату 


«d =) - (55: = 2pr S25 
or by (vii) 
(1 — p®)(e + kr) = 2pk(1 — pr) 
This together which (viii) gives р= kr which when substituted again 
in (viii) shows that k = 1, and hence 


A . =. * ^ ^ 
DX, my—Y, cs— $5 oy=Syy p=" 


15.5 EXERCISES 


1. The following table gives the evaporation values in mm, from two 
evaporimeter tanks, one of which is mesh-covered and the other kept in a cage. 
Find the correlation coefficient and the regression lines of the sample. 


Mesh-covered value (mm.) 

Cage 

value 3.5 | 4.5 15.5 |6.5.| 7.5 | 8.5 Se ADS FAL: 

(mm.) |-4.5 | -5.5 |-6.5 | -7.5 | -8.5 | -9.5 | -10.5 -11.5 | -12.5 T otal 
35- 45| 2 1 3 
45 55 1 1 1 3 
55- 65 2 2 1 5 
6.5 - 7.5 3 3 6 
7.5- 85 2 4 5 11 
8.5 - 95 5 5 2 12 
9'5 - 10.5 9 10 3 22 
10.5 - 11.5 3 6 9 

Total 2 2 3 6 6 9 19 15 9 7 


9. Compute the correlation coefficient between the marks (%) obtained in 
Numerical Analysis T! heoretical and Practical at the M.A. and M.Sc. Examination 
in Applied Mathematics 1961 by 14 candidates given by the table in the next page, 


and find the lines of regression. 


270 BIVARIATE SAMPLES [ 15.5 


Theoretical Practical Theoretical Practical 
mark mark mark mark 
84 92 46 82 
76 56 40 30 
72 84 38 70 
70 94 38 54 
64 88 34 88 
54 66 30 52 
52 90 28 60 
48 78 22 58 


' 8. A new-born baby was weighed weekly from birth, and 9 such weights 
(y) in ounces against ages (x) in weeks are shown below. 


0 


газа |s js те 


| 149 


P и» 


141 | 144 


| 150 | 158 | 161 | 166 | 110 


Find the regression line and the regression parabola of the second degree of 
the sample of weight on age, and calculate the goodness of fit in each case. 


4. Fit a straight line (a) y  co--c,x and parabolas (b) y=cy+c,x+c,x? and 
(с) y» со+ c,x? to the following data, and compare their goodness of fit, 


x| 3.5 | 84 |168 | 239 


27.1 | 28.8 


; 
y| 44 | 92 |206 ES |350 |277 


5. Fit a curve of the form у= x? +ax+b to the following data by the method 
of least squares. 


CHAPTER 16 
TESTING OF HYPOTHESES 1 


16.1 STATISTICAL HYPOTHESES—SIMPLE AND COMPOSITE 


As stated earlier, the distribution of the population is usually 
unknown, and our task is to derive some rough knowledge about the 
same from a sample drawn from the population. The problem in its 
direct form constitutes the problem of estimation which we have 
already discussed. Now the problem also sometimes presents itself 
in an indirect but an equally important form, viz. testing of some a 
priori knowledge about the population distribution obtained from 
theoretical considerations or otherwise on the basis of a sample. 
This will be the subject of the present study. 

A statistical hypothesis is, in general, defined to be an assumption 
of any sort about the distribution function of the population, F(x). 
In this chapter we shall assume that Ё (x) has a known functional form 
which involves a number of unknown parameters 0;, бз›...... ө Any 
one will reflect that this assumption is in itself a statistical hypothesis 


which we shall not, however, test for the present but take to be 
granted with certainty. A hypothesis then consists in any assump- 
tion regarding the parameters 6, 0.,...... 6,3 for example, some or all 
of the parameters take prescribed values or lie in prescribed intervals, 
or one or more given relations exist between them and so on. 

The hypotheses may be classified into two types—simple and 
composite. A hypothesis of the form 

Hs: 605700j (J231,258*. k) (16.1.1) 

where 05,5 are given numbers, і.е. a hypothesis which prescribes 
exact values to all the parameters is called a simple hypothesis. And 
a hypothesis which is not simple is a composite hypothesis. 

'The above notions may be conveniently and precisely stated by 
means of a geometric formalism. If we write 


9 = (015 62,......0к) (16.1.2) 


272 TESTING OF HYPOTHESES I [ 16.1 


then © represents a point in a k-dimensional space called the para- 
metric space P}, O being called the parametric point. It is to be 
noted that corresponding to all possible or admissible values of the 
parameters, the parametric point O may or may not run over the 
entire space P}, e.g. for a normal (т, c) population the parametric 
Space P, consists of the two-dimensional (m, c) plane, but since c is 
necessarily positive, the parametric point is confined to the half- 
plane с > 0. 

A statistical hypothesis may now be defined to be any assumption 
of the form 

Ho: Өғо (16.1.3) 
Where o is a given set of points in Py. 

If o consists of a single point 6, = (001, боз›...... бо), then Ho: 
O=Oo OF 0;-05; (j-1,2,..k), ie. Ho is a simple hypothesis. 
If о consists of more than one point, the hypothesis Не, is composite. 

Example. Consider a normal (m, c) population. 


(а) Ho: m=2, с=0.1 is a simple hypothesis. The point set w 
consists of the single point (2, 0.1) in the parametric plane Pa. 

(0) Ho: m-2. Since c is unspecified, o consists of the straight 
line m — 2, and hence Ho is a composite hypothesis. 

(с) Ho: 2<m<3 is a composite hypothesis. Here w is the 
infinite strip lying between the parallels т=2 and m=3. 

(d) H,:m=c? is also a composite hypothesis, œ consisting of 
the points on the parabola m= c*?. 

Null hypothesis. We shall very often make a hypothesis 
wishing it to be rejected by the test. Such a hypothesis is called 
a null hypothesis. This may seem somewhat strange at the first 
instant, but we shall presently see that not only from the theoretical 
standpoint but also in many important types of practical problems 
statistical hypotheses appear naturally as null hypotheses. Some such 
practical problems will be cited in the course of our discussions. 

Alternative hypothesis. Sometimes it so happens that We 
know for certain that either © è w or © г o, where wand o, are two 
disjoint point sets in P}, and it remains for us to decide between the 


16,2] GENERAL FORM OF A TEST 273 


two by means ofa test. Now if we have a priori reasons to be more 
inclined to believe in the latter hypothesis, then we set up the null 
hypothesis 

to be tested against the alternative hypothesis (16.1.4) 


Ho: Өғо | 
H,:0so1 


hoping that the null hypothesis will be rejected by the test and thereby 
confirm our belief in the alternative. If, however, we do not have 
Sufficient reasons for favouring one hypothesis to the other, then we 
may set up any one of these as the null hypothesis and the other as 
the alternative. 


In the general theory, therefore, we shall consider a null hypo- 
thesis H against an alternative H, as in (16.1.4). In case an 
alternative is not stated, it naturally means that the alternative is 
the negation of the null hypothesis, i.e. 

Hy: Өғо 


where p is the complement of w in Pp. 


16.2 GENERAL FORM OF A TEST. BEST CRITICAL REGION 
Let a sample of size П : Xi, Ха»... x, be drawn from the population. 
On the evidence offered by this sample we shall have to decide 
whether to accept or reject the null hypothesis. The mathematical 
formulation of this evidence is known as a test of the hypothesis Ho. 
Since, as we have remarked earlier, it is impossible to make a decision 
with perfect certainty, we must also have to state how much of 
confidence can be placed on such a decision, or, we may say, how 
much the test is significant. As such these tests are often called tests 
of significance. It is, however, customary and also more convenient 
to measure the significance of a test not in terms of its degree of 
reliability but in terms of the complementary quantity—the amount 
of risk of misjudgment taken in pronouncing the decision. 


We shall, for simplicity, confine our general discussions to the 
case of a continuous population only. Let f(x ; 01, Өз›...0х) =/(x ; ©) 
denote the density function of the population. The density function 


18 


274 TESTING OF HYPOTHESES I [16.2 


of the sample point x is then identical with the likelihood function 
Los, хз... 2 01, 0:,...00)= L(x ; Ө)* given by (14.1.3). 

A test ofthe hypothesis М.» in its general from, consists in choos- 
ing a region W in the sample space R” such that if the observed 
position of the sample point x falls in W, Ho is rejected, and if it 
falls in W, the part of R^ outside W, H, is accepted ; W is called the 
rejection region or the critical region and Jy the acceptance region of 
the test. 


Two types of error. Now the sample point x is a random variable, 
and as such an observed position of x may be in the critical region 
as wellas outside it, Accordingly, we are liable to make two types 
of error of decision detailed below. 


Tyre I ERROR: If H,istrue, but x falls in the critical region W 
when we reject Ho. 


ТҮРЕ II Error: If H, is false (and hence H, is true), but x falls 
in the acceptance region ў when we accept H,. 


The probability of committing Type I error is clearly 


P(x WIO ғи) (16.2.1) 
and that of Type II error is 
P(xeWi0so,)-1-P(xeW|Oso,)-1-9(W) (16.2.2) 
where 
BWW) = Р(х £ WIO e v) (16.2.3) 


represents the probability of rejecting Ho when it is false and is called 
the power of the critical region W with respect to the alternative 
hypothesis H, or simply the power of the test. 


Best critical region. For constructing good tests we naturally 
try to reduce the probabilities of both types of error as much as 
possible. But it is generally found that if the probability of one type 
of error is decreased, the probability of the other automatically 
increases, and it becomes impossible to make the probabilities of both 


"Here, for convenience, we have denoted the observed value (ху, Xi») Of 
the sample point x by the same symbol x. 


16.3 ] BEST CRITICAL REGION FOR SIMPLE HYPOTHESES 275 


the errors arbitrarily small simultaneously. , Hence, in order to obtain 
a useful test, we must strike a balance between the two types of error, 
and for this it is necessary to formulate a principle which would tell 
us the best way of doing the same. Now many such principles may 
be thought of as possible, of which the most satisfactory has been 
found to be the following. 


1. First fix the probability of committing the Type I error 
arbitrarily, i.e. given any number : (0 < ғ < 1) in advance, set 
P(xeW|Oea)=e (16.2.4) 
the number ғ being called the significance level of the test. The 
equation (16.2.4), in general, gives rise to a family of critical regions 
at the significance level г. 


?. Now in the above family of critical regions choose, if possible, 
that particular critical region which would minimise the probability 
of committing Type II error, i.e. maximise the power of the test. This 
Critical region, if existent, will be called the best critical region and 
the corresponding test the best test or the most powerful test at the 
given significance level ғ. 


16.3 BEST CRITICAL REGION FOR SIMPLE HYPOTHESES 


The best critical region for testing a simple hypothesis 


Н: 0706 
against a simple alternative (16.3.1) 
H,: 0=0; 


is given by the following theorem. 
Neyman-Pearson theorem. The set of all points x in the sample 
space К" satisfying the inequality 


T(x 3 Go) 16.3.2 

I(x;0,) < б ‹ ) 
is the best critical region W = W(k), where k(> 0) is a constant which 
is determined (if possible) as a function of the given significance level 


є by 
P(x: W|0=00)=€ (16.3.3) 


276 TESTING OF HYPOTHESES I [ 16.8 


Proof. Let W' be any other critical region at the significance 
level ғ, i.e. 


Р(х г И |ө=0,)= ғ ) 


Clearly, the theorem will be proved if we can show that the power of 
the region W is greater than that of И”, i.e. 


ВСИ) > BV) 
Form (i) and (16.3.3) we get 
[ze ; O)dx= | L(x,O,dx  [dx-dxidxz...dxn 


m n" 
or 


L(x; 6,)dx= | L(x ; 69)dx Gi) 
w-ww' W'-WWw' 
Nowin W-WW' 
Lx; 017 i L(x 3 Oo) 
and in W’- WW' 


Lx O)<j LG ; ө) 


So | 
Ar) =| цо: 01)dx = | L(x; Ox)dx+ | L(x ; Ө,)ӣх 
Ww W-Ww' Wm 
>4 | Læ; өдах+ | Цу; ond 
= ТУТУ” ww’ 
and 


w= | Læ; oddx= | Це; ode | цк; oo 


w m-Wwm mir 
«i f L(x ; Oo)dx+ | L(x ; 0,)dx 
w- ww’ m 


Hence by (ii) 8(Ў) > p(W’), which proves the theorem. 


16.3 ] BEST CRITICAL REGION FOR SIMPLE HYPOTHESES 271 


Composite alternative. The best critical region W, in general, 
depends on O,. But if the situation is such that for all values of Ө, 
lying ina k-dimensional interval a <0, «f we get the same best critical 
region W, then W can evidently be regarded as the best critical region 
for H, : 9—0, against the composite alternative H, : а<0<86. 


Working rule. Now the above theorem gives an n-dimensional 
Critical region, n being the size of the sample, which is rather imprac- 
ticable to work with. Accordingly, we deduce the following convenient 
working rule. 

Rearrange the inequality (16.3.2) in any convenient form 

zeR (16.3.4) 


Where z=z(x) is a suitable statistic whose density function f.(z ; ©) 
can be easily obtained and R is a region of the z-axis, usually consisting 
of one or two intervals, determined by 


P(Z«RIe-o2- | fle; @o)dx=e (16.3.5) 
R 
Thus the n-dimensional critical region W dwindles into two things, 
viz. a statistic 2 and the corresponding one-dimensional critical 
Tegion R. 

To sum up we proceed along the following steps : 

1. Write down the likelihood function L(x; Ө). 

2. Write the inequality (16.3.2), and reset it in the form (16.3.4). 
This gives the statistic 2 and the form of the critical region R. This 
can usually be done in infinitely many ways ; of these, choose a form 
of (16.3.4) which gives the most convenient statistic. 

3. Determine R by (16.3.5). This fixes the extent of R which is 
thereby completely determined. 

4, Compute the value of the statistic z from the sample. If 
this value falls in R, reject Ho; otherwise accept it. A computed value 
of the statistic which falls in the critical region is customarily said 
to be significant. 

5. Ina practical problem the significance level is not usually 
decided upon in advance. In such cases we calculate the value of 


278 TESTING OF HYPOTHESES I [16.4 


z, and find the minimum significance level at which the critical region 
just includes this value of z. If this level is sufficiently small, we can 
reject Ho, and if not, we have to accept H,. The question how small 
this level should be in order to be able to reject Н, confidently is, 
of course, entirely relative and depends on the particular nature of 


the problem we are dealing with. The following convention is, 
however, often adopted in practice. 


A computed value of z will be called (i) not significant if it falls 
outside the critical region of the 5%, level, (ii) simply significant if 
it falls inside the critical region of the 5% level but outside that 
of the 1% level and (iii) highly significant if it falls inside the 


critical region of the 1% level. 
16.4 APPLICATIONS TO NORMAL (m, с) POPULATION 


Test for m. We assume that с is known and wish to test the 
hypothesis 


H,:mem, 
against an alternative 
H,:m-m, 
where my, and m, are two given unequal numbers, 
From Sec. 14.2 


1 
-ghir m) 
L(x ; m) = (2a) "2o e *5 f: 
So 


L(x; mo) _ am Bile; mj -z -m,)3] 
L(x; m,) 


n 7 
7 goa ni 7 mo) 7 то-та) 
=e 


Hence the form of the best critical region is given by 
qiio maio толы) _ k 
k being a constant. 


Now two cases arise according as m, > or < то. 


16.4] APPLICATIONS TO NORMAL (т, в) POPULATION 279 

ө Сазе 1. т, > т. In this case the above inequality reduces to 

he form 7 > k’, k' being another constant, or, more conveniently, to 
the form 

278 Ут mo) us (16.4.1) 

where we know that, under 

Но, О has a standard normal 

distribution. Hence we may 

choose и as our required statistic, 

the corresponding best critical 


0 \ He region being the interval (te, ә) 
Fig. 29 where и, is determined by 
P(U > te) =e (16.4 2) 


We note that the constant u, is directly determined by (16.4.2), and 
hence the statistic и and the corresponding critical region are 
independent of the particular value of m, but depend only on the 
Prior assumption that m, > то. This obviously means that the above 
test serves as the best test of Ho against the composite alternative 
Н,:т >т. 


Case П. m, < "о. Here the inequality in question reduces to 


иб An т mo) < и, (16.4.3) 


so that best critical region for 
и 18 (— œ, — u) where 


PU < - Ue) =E 
or 
=u 0 
P(U > te) =e Fig. 30 
which is the same as (16.4.2). This again serves as the best test of 
H, against the alternative Hı : m < Mo. 
1 region is the right tail of the standard normal 


In Case I the critica! 
density curve, and in Case П it is the left tail of the same, and as such 


280 TESTING OF HYPOTHESES I [16.4 


we say that we are concerned with a right-tailed test in the former and 
a left-tailed test in the latter. 


For testing Ho : т= то against no specific alternative, i.e. against 
the alternative H, : m = то, no 
best critical region is available. 
We may, however, consider, by 
a practical compromise of the 
above results, a symmetrical two- 
tailed test where the critical p: M ы 
region consists of the pair of Fig. 31 
intervals (- ©, —w,) and (ue, =), и, being given by 


P(U < -u)+P(U>u)<< 


or 
P(U > u)=}e (16.4.4) 
We may certainly expect that this will prove to be a good test, 
Remark. In the last case for testing Ho: meme against no 
alternative, the hypothesis will be accepted if 
SUKU < Ue 
where u, is given by (16.4.4), or 


-иш‹< мт - mo) < il 
c 


or 


= ou, zo, Ole 
-E z+- 
E Jn < то < Jn 


i.e. mo lies in the interval (z- om E+ 2] which, by (14.5.1) and 


(14.5.2), is nothing but the symmetrical confidence interval for m with 
confidence coefficient 1 – г. This reconciliation is indeed interesting 
as well as logically satisfactory, 


Example 1, In a ceramic industry the population of percentages of yield of 
first class material was known to have a mean 72.6 and standard deviation 2.4 (as 
Obtained from past large samples). A new incentive bonus was declared, and 8 
Subsequent sample of size 15 from the population gave a mean of 74.3. Does pe 
Teasonably show that the bonus really helped raising the average yield percentage 


16.4] APPLICATIONS TO NORMAL (m, s) POPULATION 281 


(It is known from experience that populations of yield percentages have normal 


Ог approximately normal distributions). 


We assume that the population standard deviation remains unaltered, i.e. the 
ince we are anticipating a rise in 


new population is normal (т, с) where c=2.4. Si 
the average yield percentage or m > 72.6, we have to pose the null hypothesis 


: m2 72.62 то (say) 


о: 


against the alternative 


i m> Mo 


Here the appropriate test will be a right-tailed standard normal test, and let 


us test at 5% level of significance. 


For ¢=.05, ue is given by 
P(U > te) = 05 
whence, from Table I, ие= 1.65, so that the best critical region of the test is 
и> 1.65. 


As n=15, ¥=74.3, с= 2.4, mo = 72.6, the computed value 
=2.74. Since this value of и falls within the critical region, 
hypothesis that the mean yield percentage continues to be the samo, 
belief that the incentive bonus was really effective is confirmed. 


of u= Nn(F-mo)lo 
we reject the null 
so that our 


we de not fiix the significance level beforehand, we proceed as 
e of u is 2.74, and from Table I, P(U > 2.74) -.0031 
falls within the critical region of even 1% level 
the value of u is highly significant. Hence we 
) reject Ho. 


If, however, 
follows. The computed valu 
which shows that the value of u 
or, according to our terminology, 


can confidently (i.e. with small risk of a wrong decision 


Remarks 
1. We know that the sample mean is an estimate of the population mean. 


Now the mean of the sample from the new population is 74.3 which is greater 
than the mean of the old population 72.6 ; but we note that the difference between 
the two is not too marked to enable us to rush to the conclusion that the popula- 
tion mean has really increased, for this difference might also have arisen due to 
random fluctuations. Hence, in order to come to à definite conclusion, we must 
have to make a statistical test, as done above, which depends on, besides other 
factors, the size of the sample, the population standard deviation etc, 


2. The above example is one of the many practical problems in which a 
hypothesis arises naturally as a null hypothesis. 


Test for с. Consider the hypothesis 


Ho: 6700 


282 TESTING OF HYPOTHESES I (16.4 


against an alternative 
Ну: с=о, 


Since m is unspecified both H, and H, are composite hypotheses. 
But these can be regarded as simple hypotheses if we consider the 
population of the statistic s? (or S°), the density function of which, 
given by (13.4.2), contains the only parameter c. When a sample of 
Size n is drawn from the parent population and the value of s? is 
computed, we get a sample of unit size from the population of s?, for 
Which the likelihood function is 
L(s? ; c)=f,2(s? ; о) = zb (j^ (s2)/8-1 g- rir 
à E 2 Tv) 2c* 
The form of the best critical region is then given by 
Т: ооу. бү mesure ile is 5 
поез) (oa) MEO pastum 
Сазе Т. в, > со. The above inequality may be written in the 
form 
s. 05° & : 
х=” > хез (16.4.5) 
со 
so that the required statistic may be chosen to be x? ^ n5?/o,?, the 
sampling distribution of which has, under Ho, a x?-distribution with 
y=n-— 1 degrees of freedom, and the corresponding best critical region 
is the right tail x? > xe? of the x*- 
density curve, хе? being given by 
TOC 24s (16.4.6) 
Since the statistic and its best 
critical region are independent of | 
the particular value of c, but EB 
depend only on the fact that a 
cı > соз the test holds against xe 
the alternative H; : o > og. Fig. 32 


Cass П. о; < со. In this case we have 


x? UE. < xe? (16.4.7) 


со 


16.5] LIKELIHOOD RATIO TESTING 283 


which shows that the best critical region is now the left tail 


0 <x? < x? where 


P(0 <x? <x.) E 
or 
P(x? > xe)el-« — (16.4.8) 


This gives the best test against 
the alternative H,:0<¢o- In Xt 


Order to test Ho: oco against Fig. 33 
nO alternative, it is customary to use a two-tailed x?-test, in which 


the critical region consists of the 
intervals (0, x?) and (Pes, =) 
Such that 

Р(0 < X? < х1) = 
or 


P(x? > x*4)-21- = 
and | 


P(x? > Xes) = 56 
І in Ex. 1 the sample standard deviatio: 
e standard deviation of the new 


хе 


(16.4.9) Kes 

Fig. 34 
Example 2. n was 1.9, test at 5% level 
if our previous assumption that th population 
continued to be 2.4 was justified. 

d to test the null hypothesis Ho : с=2.4=00 
the two-tailed x?-test should be used. 


We are require (say), and as there 
is no specific alternative, 


For у= 14 degrees of freedom and (20.5, X? e, and Х' eg are given by 


Pt >x e1) =.975, P(X? >X? ea) =.025 
By Table II x? e1 =5.569, X’ єз = 26.342. Hence the critical region CO! 
intervals (0, 5.569) and (26.342, со). 


Since $=1.9, x* =nS?/co7 =9.40 which does not fall in the critical region, and 
It will be seen that the value of x? is not significant even at 
tion that the standard deviation of the new 


nsists of the 


hence we accept Ho. 
20% or 30% level, so that our assump 
population was 2.4 seems quite reasonable. 


16.5 LIKELIHOOD RATIO TESTING 
е composite or even when they are simple for 


When the hypotheses ar 
theorem does not lead to a convenient 


which the Neyman-Pearson 


284 TESTING OF HYPOTHESES I [16.5 


test, we may take recourse іо another method for constructing tests 
of hypotheses. This is the method of likelihood ratio testing which 
usually yields a good test but not necessarily the best test. It will, 
however, not be possible to treat any general composite hypothesis 
by this method, but instead we shall consider a composite null 
hypothesis Н, against no specific alternative, of the following forms : 
Some of the parameters are specified or some given functional relations 
exist between them, so that under the hypothesis H, the number of 
unknown parameters is reduced. Let 0’ denote the set of parameters 
still unknown under НУ. Suppose, for example, Ө = (01, 62, 05). (i) If 
Ho: 6,=1, then under H o the parameter 9, becomes known, and the 
parameters still unknown are 9, and 64, and hence 0' — (05, 03) (ii) If 
Ho:0,*0,*0,-0 or өү=-— 6.—6s, then under H, the unknown 
: parameters may be taken to be 02, Өз, i.e. O' = (02, 04). 


Let L(x; 9) denote the likelihood function of the sample and 
L(x ; Ө) that under Hos i.e. 


Lo ; 0)=L(x 0| H,) (16.5.1) 
Ifthe maximum likelihood estimates 9 -G(x) and 6' - Q'(x) of О and 
and 6' respectively exist, then 
max L(x ; 0)= L(x ; ô), max L(x ; 0')=Lo(% ; 9) 
Or, speaking clearly, the maximum value of the likelihood function 


L(x ; ©) (for a fixed sample point) when the parametric point O is 
allowed to vary over the entire admissible part of the parametric 


Space P, is L(x ; б), апа the same when the parametric point is 
allowed to vary only over a set of points in P, given by Ho is 
L(x ; Ө). Obviously it follows that L,(x ; 9) cannot exceed L(x; ©), 
ie. 
0< L(x; 6) < Цх; 6) 

Setting 

12 Io 9) (16.5.2) 

L(x; ©) 

we have 


0 Zit 1 (16.5.3) 


des] LIKELIHOOD RATIO TESTING 285 


The statistic 2 = 4(x) which is free from unknown parameters is called 
the likelihood ratio for Но, and (16.5.3) shows that the spectrum of 
the Corresponding variate A is the interval (0, 1). Now the density 
function of à under Ho, in general, depends on Ө'; but if, in a parti- 
Cular case, it is independent of all unknown parameters, we can proceed 
to construct a test of Ho as follows. 


If H, is true, L(x ; 0) - L(x ; Ө), and since maximum likelihood 
estimates are known to be good estimates of the parameters, We have 


I(x:O0)mL(x;O) [аб ; 0) = Гобх: 0) 
and hence if H, is true, Lo(x ; 8) = L(x; 6) or a=1. Thus if 


the observed value of the statistic a lies close to 1, we may reasonably 
believe that Н, is true. On the other hand, if the observed value of 


А is close to 0, it follows that L(x ; 6) <<L(x;6) or L(x; 9) 
<< L(x ; 0) which shows that it is plausible to conclude that Не is 
false. Hence, for testing Н, we can take à as the statistic of the 
lest, for which the critical region will be (0, 2) where 4. (0 < 2: < 1) 
is a constant such that the probability of Type I error, 


Ae 
PO<A<te|Ho)= | ла (16.5.4) 
0 


where fi(j) is the density function of А under Ho. Since f(A) is 
assumed to be independent of unknown parameters, the equation 
(16.5.4) uniquely determines 2, as a function of ғ. 


Remarks 

4. The above method does not always yield a test, for it rests 
upon the assumption that the density function of А under H, does 
not depend on unknown parameters which is not always the case. 


2. The likelihood ratio test is based on intuitive concepts and not 
on any exact logical criterion. We have, in fact, disregarded specific 
alternatives and have not also taken direct account of the Type II error, 
It can, however, be proved by further investigations that the likelihood 
ratio test generally corresponds to relatively small Type II error and as 
such is a good test. When a specifie alternative to the null hypothesis 


286 TESTING OF HYPOTHESES I [16.6 
is not stated, the best test usually does not exist, and in such cases 
the likelihood ratio test provides a good second choice. 


3. We can make a transformation z = z(4), in case z is a more con- 
venient statistic than 2. If the critical interval 04-2. transforms to 
the form z ғ Re where Re is a region of the z-axis, then Re is the critical 
region for the statistic z determined by 


P(Z « R| H.) -| ‘fem (16.5.5) 


Re 
where f,(z) is the density function of Z under H,, which is, of course, 
independent of unknown parameters. 


16.6 NORMAL (m, c) POPULATION 
Test for m. Н: т=то 


Case І. с known. Hy, is a simple hypothesis. From Sec. 14.2 


E 
L(x ; т) = (22) "o"e 2 (Tm 


or 
log L= - gasei — m)? + terms independent of m 
7 
The likelihood equation is 2 to gL =0 or x(x,-m)-0 which gives 
m-X. So 


L(x; M) = (2a) "207" ec ^51 05* 
The likelihood function under Ho, 


=) (s,- mo) 
L (x)= L(x ; mo) = Qz) ^257e aga Ui mo 


Hence the likelihood ratio 


meno е У(2;,-то)з - 52) 
L(x; т) 


= е0 - т0)2/202 — о-и22 


Hee] NORMAL (т, а) POPULATION 287 
Where 

аА") (16.6.1) 

с 

Now instead of л it will be more convenient to take и as our statistic 
Whose sampling distribution under H, is known to be normal (0, 1). 
The critical interval 042. clearly transforms to the form lu| > Us, 
ie. the two tails (— =, — Me) and (и =) of the standard normal 
density curve, where ue is given by 

P(|U|> «) = 
ог 

P(U > и) = tc (16.6.2) 
Thus we arrive at the same test as given earlier by the Neyman-Pearson 
theorem together with a practical compromise (cf. Sec. 16.4). 


Case II. с unknown. Here the hypothesis H is composite, and 
the likelihood ratio test will lead to a result typical of its own. The 
likelihood function is given by 
Elir,- т)? 


кч 
“оэ 


L(x;m,c)- Qu)" о" e 
We know me, qe so that 


L(x ; т, 2)= Qa)? St жые 


Now 
а -n - Eim,- то)? 
Ly(x ; 0) = LX 5 Mos Jee ont eur 
or 
1 9 
log Lo = –п 1080-5, he (х; = mo)” + const. 
91 . 
О! ово =0 gives — 5 +4 b^ (x; mo)? 0, ог 
жа : P (x, mo)? = 5° + Gr - mo)? 
So 


Lis: e лу" [St Gm] enn 


288 TESTING OF HYPOTHESES I [ 16.6 


Hence 
w^ = 2) —n 2 
m Lols; c) -{1 „© та) } 
L(x 5m, o) 
or 
T t? -n[2 xs 1 
a=(1+5) Г 
where 
is „п — mo) (16.6.3) 
S 


We know that, under H,, the 
random variable f£ has a 
t-distribution with ven-1 
degrees of freedom. Hence we 
choose ¢ as the statistic of 
the test, and the critical 
region 0 < 4 < 2, corresponds 
to |t|> te where —te 0 


tom 
le 


P(|t|> tae 
or 


P(t > te)=he (16.6.4) 


ONE-TAILED t-TESTS. To test H, : т= т, against the alternative 
H, : т> то, the intuitive modification of the above test will be a 
right-tailed t-test, i.e. the critical region will be t > te where P(t>1.) = 
г. Similarly, for testing Но: т= то against H, : т < mo we use a 
left-tailed t-test, the critical region being t < — te where P(t < — t)=e 
or P(t >t.) =<. 


Example 1, In Ex. 1 Sec. 16.4 we can also do without the assumption regard- 
ind the standard deviation of the new population. In that case we have to make 
a right-tailed t-test, using the standard deviation of the sample. 


The statistic t= Jn(¥—m,)/s=3.348. It is seen from Table III that, corres- 
ponding to »—14 degrees of freedom, the critical region for 1% significance level 


is t > 2.624. Hence the observed value of t is highly significant, and we confidently 
reject Ho. 


16.6 ] NORMAL (m, c) POPULATION 289 


Test for Ho : с= оо 
We have 


E(x, =m)? 


1 
L(x ; m, o) = Qa)? 257^ e 7 
As before 
L(x ; m, c) = Ол)" S en 


"ow 
2092 


Xi. 
Lo(x ; m) L(x ; m, а) = (Ол) *!!* Саш; тұт)? 


Easily we find m=, and 


Lol% ; M) = (2л)7"/® сот" e7n82/8, 


So 
= Lo(x ; m) =е"!® (y "LI 
L(x ;m, о) со 
ог 
=з (0) = [ST е-х20 (x2) 
where 
nS? 
Хз (16.6.5) 


is the required statistic, the sampling distribution of which under Ho 


is x?-distributed with v=" – 1 degrees of freedom. 


For finding the critical А 
region for x^, we note that the 


inverse function of 2=A(x"), 
x*ex*(i) is a double-valued 


function of 4 ; A=0 when x? =0 | 

and x? — œ, and the equation [ | 

A(xX?)-2A. has two solutions = 

X! = XA er Xes such that Xe ‘en x 
Fig. 36 


(x21) 720882) 


This equation determines X^.» as а function of x?,, 


(cf. Fig. 36). 
region (0, 4) for à corresponds to the pair of 


Hence the critical 


19 


290 TESTING OF HYPOTHESES І (16.7 


intervals (0, x2<,) and (х°.., =) for x? where the only unknown X 
is now found from the equation 


P354 < ХЗ < Хе) = 1-е 


The above method of determination of x?,,, x?., is, however, 
only theoretical and cannot be followed in practice, In practice, we 
usually make a two-tailed test such that the two tails of the x?-density 
curve have equal areas, each being je, ie. х“, and x?,, are given by 
(16.4.9). This was, in fact, the result obtained in Sec. 16.4. 


167 COMPARISON OF NORMAL POPULATIONS 


Let us be given two normal populations having parameters (m;, ci) 
and (ma, o2), from which two independent samples 


x=(*,, Жез - Р: 94. 
of size п, and 


+ D , D 
ex^ 39255 995.26 ah) 


of size n, are respectively drawn. On the basis of these samples, we 
shall test the equality of means and variances of the two populations. 
But before that, we prove that following results which will be necessary 
inthe sequel. Let the characteristics of the first and second samples 
be marked with subscripts 1 and 2 respectively. 


Theorem. lfc;-265-o, the statistic 


(a) U- Ай nna (Х.Х) - (m, — mz) is normal (0, 1), 


n, tno с 


(b) дз 115:° +252? is x?-distributed with v=n,+n,-2 
[21 


degrees of freedom, 


“пп. (=) -(т-т,) . — > 
(с) t= Jv aum W- mo) is t-distributed with 
Zz 2 af 1 1 2 2 


Y—n, 4-75 —2 degrees of freedom, and 


2 s Й 
(4) pei called the variance ratio, has an F-distribution with 


parameters y, =n, — 1 and v, 7n, - 1. 


16.7] COMPARISON OF NORMAL POPULATIONS 291 


Proof. We know that XY; and X, are respectively normal 
(ту, с/ Jn,) and (ms, с/ /ns). Since x and x’ are independent, XY 
and X. are also so, and hence X,—- X. is normal (т; -m;, 
c J(n, +na)/n,n,) Which immediately leads to (a). 


Now n,S,?/o? and n4S,?/c? are independent x*-variates with 
y,7n,—1 and v, =л. – 1 degrees of freedom respectively, and hence 
their sum (n, $4? +1.5.2)/с° =x? is x*-distributed with v, +V =n; + 
п, — 2 =у degrees of freedom which proves (b). 


Since ¥, and S,? as well as X, and Są? are independent, it 
follows that U and X? are independent, and by Theorem I Sec. 9.2 
Jv Ul JX? = ё has a t-distribution with v degrees of freedom. This 


is (c). 

Again since nS,?|o? and п„$»°/в%® are independent x?-variates 
with v, and v, degrees of freedom respectively, by Theorem I 
Sec. 9.3 


van SaPo? 512 р 
Vanas alo? Sa? 
has an F-distribution with parameters Уу, Ys. 
The likelihood functions of the two samples are respectively 


i. EX(-mi) 


L(x ; m, 03) = Qn) "1 * оз "е EN 
and 


1 
1.(х' imas os) = (Ол) "з /* one КТЫ Eiz- та)? 


so that their joint likelihood function (i.e. the joint density function 
of the independent random variables x and x’) is given by 


L(s, x! i ту, May ву, ва) = Г. is, оу) Lal" ; Moy са) 


ЛЕ 
alo,” 


1 
E(z,7mi)34, 3 Ж(х,'—тз)% 
= (2p) tetra oi "ior "е (z, 7 m1)* * 5.3 ) a ] (16.7.1) 


Test of equality of means. Assuming оз =оз= о (say), we shall 
construct a test for the hypothesis Ho : m, =Ma. 


L(x, X! i Ma, Ma, 0) , 
(20) tataa (nyt) g 202 IE(z, таја EG та)?) (16.7.2) 
= (2л) "1 с 


292 TESTING OF HYPOTHESES I [ 16.7 


Casel. c known. We have 


1 
maps lr, та) + Ec ү та )21 
- -l 2 i 
L(x, x! ; m, Mg) = (Qu) +9901 going ШЕТУ 


The likelihood equations are 
ов o 210820 
дт, OMe 
which respectively reduce to 
x(x-m;)20, (х/-т„)=0 
giving ` 
т, =7,, Ma ех, 
So 
L(x, Ж ms ms) = (2n) -\"1+па)® 5 - Gui nig (n1812 n3822)/20* 


The joint likelihood function under Ho : m, = тз = m (say), 


Lo(x, x! ; m) 7 L(x, x! ; m, m) 


1 
g 7353 (206, 7 т)? (а т)? 
= (2л) Marta) /2 57m n) е 2ga (7m) zy-m) 


The equation 2108 Lo =0 gives x(x;- т) + x(xj-m)-40 or 


DiX,cügXs 
п; +Па 


т = 


Now s(xi- m)? + (ху = m)? 


=s(x,;-%,)? +n, (F1 - m)? + S(x; — 9)? + п„(®„— т)? 


a, Me (= = үз 
-n,S,2 4 n,S,* + —1—#—(ж,-) 


n, * Hg 
So 
Lx, x! ; т) = (22) maa Que їш1811+пз8з1+у ppg Ga cai] 
The likelihood ratio 
NECLHONM 


Lxx! тута) 


16.7 ] COMPARISON OF NORMAL POPULATIONS 293 


where 


= (тпа os 
и DTE = (16.7.3) 
provides the suitable statistic for the test, the sampling distribution 
of which is normal (0, 1) under Ho. The critical region for 4 is 
0 < 4 < 4e and hence the same for u will be |u| 2» ue, и, being given 
by 

P(|U|ug)-s 


P(U > и) = (16.7.4) 


For testing Ho against the alternative H, : m, > Ma Or m, < ns, 
we naturally consider one-tailed tests, a right-tailed test for the former 
and a left-tailed test for the latter. 


Another method. We may suggest another simple method for 
deducing the above test. It is known that the population of the 
statistic X, — X. is normal (m, —ms, о (n, * ng)/n,ns), 50 that Ho: 
m, =m, is equivalent to the hypothesis that the mean of this popula- 
tion is zero. Since the computed value of x,— xa may be treated as a 
sample of size 1 from the corresponding population, the above test 
follows as a particular case of Sec. 16.6. 


Casell. c unknown. The joint likelihood function is given by 
(16.7.2. The likelihood equations are 


9logL g OlogL. 


- 0 
om, > т, 


which respectively give 


A - 
M,=Xy, HX. 


еъ 115, HNS" 
9 n, ng 


giving 


294 TESTING OF HYPOTHESES I [ 16.7 
ES xh ms, Ts, 2) 

= (2л) 0212 (п; + п») үїтз)!з (п,5,° +п,5 92) 49/2 en tty) /2 
Under H, : m, =m, =m (say) the likelihood function becomes 
Lo(x, x! ; m, o) 2 х, x’ ; m, m, о) 


ВРЕ ЭТЕ {дл 
= (20) imagi e gga [Ў(ж,-т)%+Ў(ху/- т)з1 


91 
98 Do =0 gives x (x; - т) +5 (х; – т) =0 or 
a _1,%,+No%o 
n, Hg 
and 2 2log Loo gives 
+ 
= ca + ume + >e -m) \=0 

or 


1 ann 
—— in, S, 4 n4 2 (Fi-a “| 
“жек { n, + I (gc) 


L(x, x' : m, с) = Qa) i22 (n, + патат 


п.п А (nyt) /2 
x4n,8,? - n4S,? + —!—%—- (7, — Fe)? Jj e-mitnaya 
b. Б ama Nyt Ne 
Hence 
T 
ga Poles ximo) _ 
L(X, x! iM Maza) 
or 
£2\—(nitnat2 
ie (i [v»2n,*n,-2 
where 
“aha %,-% 
ЧИНЕ 4 m (16.7.5) 
i ni +1, Anu S, ns, 


16,7] COMPARISON OF NORMAL POPULATIONS 295 


Under H, the variate £ has a t-distribution with v degrees of freedom, 
and hence / can be chosen as the statistic of the test. The critical 
interval 0 < 2 < Ас corresponds to the critical region |# |2 te where 


P(\t|> 1) = 


ог 
P(t > te) =4e (16.7.6) 


A right-tailed test should be used for the alternative H, : m, > Me, 
and a left-tailed test for the alternative Hy:m, < Ma 


Example 1. There are two brands, А and B, of a type of string, of which the 
brand A sells at a slightly higher price than the brand B. 12 pieces of each 
brand of string were chosen at random, and their breaking strengths observed. 
The sample mean and variance for the A-strings were 18.8 Ib and 4.081b?, while 
those for the B-strings were 16.9 lb and 3.25 Ib? respectively. Assuming that 
populations of breaking strengths of strings are normal and those of the A and B- 


strings have the same variance, test whether the 4-strings are on the average better 


than the B-strings in respect of breaking strength. 
Calling the populations of breaking strengths of the А and B-strings the 


first and second populations respectively, we set up the null hypothesis Hoi 
m,=m, against the alternative H, : m, > Ma» For testing this a right-tailed 


t-test should be made, where the statistic t is given by (16.7.5). 


Here n,=n,=12, v=22, z,-18.8, $,:24.08, т,=16.9, 5," =3.25 which give 
122.328. Now, for 22 degrees of freedom, the critical region for 595 level is 
t> 1.717, and that for 1% level is t > 2.508. These show that the value of 
t falls within the critical region of 595 level but outside that of 1% level, or, in 
other words, the value of t is significant. Hence we have some reasons for 
rejecting the null hypothesis and believing that the mean breaking strength of the 
A-strings is greater than that of the B-strings, although we are not very confident 


about it. 


Test for equality of variances. Ho:0o17062 

The likelihood function is given by (16.7.1. Since L- L,L;, we 
shall obviously get 
M =F Mg = Ху, "4 =S, as Ss 


we es 


L(x; p m, Mes су, 
The joint likelihood function under Ho: o1=03=0 (say), 


Ga) - Qa) tn. S,7m 5,7" gina 


296 TESTING OF HYPOTHESES I [16.7 


Lo(x, х'; ту, Ms, с) becomes identical with the likelihood function 
given by (16.7.2), and hence it follows from the preceding discussions 
that : 


Lo(x, x' ; тү, Ma, c) 


= Qa) Qi n3)? (n, + п») +а\у? (1,542 + gS?) tnay2 е-(па +202 


Непсе 
Р 1 \ni/2 1.2 
=j = (nı +n) pet - 
®=}(Е) = (n, n,) (: x) (1 =) 
: Fur 
Еу) нта [vem], vesn] 
Where 
pun 
LE (16.7.7) 


gives the required statistic, the sampling distribution of Which under 
H, is known to be F-distri- A 
buted with parameters Vus Ves 
Now 4=0 when F = 0 as well as 
F— c, and the equation 
ҖЕ) =}. gives two solutions Ae 
F-Fa, Fes such that A(Fe,) 1 | 
7 XF«4) as shown in Fig. 37 so } 1 
that Fe, is obtained as a function Fea Fez F 
of Fe. Therefore, the critical Fig. 37 
interval 0 < л <4. transforms to the two tails 0< F <F,, and 
F > Feo of the F-density curve, Fe, being finally determined by 

P(Fa < F< Fes)=1-8 


In practical problems, how- 
ever, we simply consider equal 
area tails, i.e. 

P(0 < F< Fa)=}e 
or 

POF > Е.,)=1-4‹ 


and | (16.7.8) 
PR > Е,„)=1& Fig. 38 


16.7 ] COMPARISON OF NORMAL POPULATIONS 297 


Practical computation. The F-distribution, we note, depends on 
two parameters m, n, and as such preparation of detailed tables of the 
same becomes a difficult affair. Tables IV and V at the end of the 
book show only the points Fs and F respectively, where the nota- 
tion F. is defined P(F > Е.) = ғ, corresponding to different values of 
m,n. These are respectively called the 5% and 1% points of the F- 
distribution. Thus for an F-test the right critical point can be directly 
obtained from the table (if, say, we want to test at 5% significance 
level, the right critical point is Ёл» which can be obtained from the 
values of Fos and F.o by the rule of proportional parts) but not the 
left critical point. The latter, however, can be calculated by making 
use of the fact that 1/F is also F-distributed having parameters п, m. 
For example, to find F s; we have 


P(F > Е) = .95 


ог 
P(F < Ез) = .05 
which is equivalent to 
P( 2 > Fa) EE 
so that 1/95 is the 5% point corresponding to the parameters п, m, 
from which we get F95. 


We can, however, avoid calculating the left critical point altogether 
if we adopt the following procedure. We remember that for 
moderately large values of т, n the mode of the Г -distribution occurs 
near about 1, and usually Fe, < 1 and Fes > 1. Now if we place the 
larger s? in the numerator of the variance ratio (i.e. call the sample 
corresponding to the larger s? the first sample), the computed value 
of F will always be greater than 1 and can never fall in the left critical 
region, and hence it will suffice to calculate the right critical point 
Fe, only. This evidently means that instead of a two-tailed test at 
significance level e, we are using a right-tailed test at level à. 

If we want to test Ho : o1 5082 against the alternative Н, : ei a сз» 
the proper test will, however, be a right-tailed test at the given 
significance levels. For testing Ho against the alternative He ee 
оз < aq а left-tailed test should be used, but this can again be 


298 TESTING OF HYPOTHESES I [ 16.8 


a reduced to a right-tailed test by simply interchanging the two 


populations. Thus in all cases we can conveniently stick to the right- 
tailed F-test. 


Example 2. In the previous example test the assumption that the populations 
of breaking strengths of the A and B-strings have a common variance. 

The null hypothesis is Ho:c,-c, against no alternative, which has to be 
tested by means of a two-tailed F-test. 

The variances of the two samples are 4.08 and 3.25, and denoting the greater 
of these by $,* we have S,?=4.08, S,” = 3.25, so that F=s,?/s,?=1.26. 

At 10% significance level the right critical point F eq is given by P(F» Fes) = .05 
Corresponding to the parameters »,=11, ›,=11. By Table IV Fea =2.83, i.e. the 


right critical interval is F> 2.83 which shows that Н, is accepted even at 10% 
level. 


Remark, In Exs. 1 and 2 above, as also in many other problems, the population 
mean gives an index of the general or average quality of the products. The 
population standard deviation then provides a measure of variability or lack of 
uniformity of the quality. For good products the population naturally should 
have a large mean and a small standard deviation. 


16.8 BIVARIATE NORMAL POPULATION 


We shall here set up a test for the null hypothesis H,:p=0. For 
‘this, we will assume the following theorem whose deduction is 
somewhat complicated. 


Theorem. If p=0, the statistic t= Jv JF hasa f-distribu- 
tion with v =n – 2 degrees of freedom. 
The likelihood function L(x, у; m,, My, ozs суу p) is given by 
(15.5.4). Since Mm: =F, Mm =F, 6 — Sp 6,7 $,, p-r, we have 
L(x, y i Ps, Mys бау су, P) e Qn) ^ Sa” S7 (1 -reye gn 
"The likelihood function under H, 
Lo(X, y.; Me, My, ox, oy) 


Easily we obtain ^ 


—-— E ^ 
M,=X, ту= у, oz=S,, су= Sy 


16.9] EXERCISES 299 


So 
Eq yif Du, du, as) (a) 79 бабу er 


The likelihood ratio 


2|1-n/2 
ie( re - (ce) [ven-2 
"where 
" 
t= Jy Jis (16.8.1) 


By the above theorem, the variate £ under Ho has a f-distribution 
with v ^n —2 degrees of freedom, and hence t is our required statistic. 
‘The critical region for ? which corresponds to the critical region 
0 < л <he for Ais |t] >te where f, is given by 


P(\t|>t=e 


or 
P(t > 1) = (16.8.2) 


In case we have an alternative Ну: p> 0 or p < 0, we have to 
use one-tailed t-tests, a right-tailed test for H,:p>0 and a left- 


tailed test for H, : p < 0. 


A random sample of size 10 from a bivariate normal population is 


Example. 
Test if the population can be 


-found to have a correlation coefficient. 0.47. 
regarded as uncorrelated. 

H,:920, and the test is a two-tailed t-test, the statistic being given by 
(16.8.1). 
Here r=0.47, v=8 so that 121.506. Now P(|t| > 1.506) = .18 which shows 
that the value of t is not at all significant. Hence we can reasonably regard the 


population as uncorrelated. 


16.9 EXERCISES 


1. Fora normal (m, с) population with known m, 
Neyman-Pearson theorem, a test for the null hypothesis Ho 


construct, by means of the 
:e-o, against the 


alternative Н, :& < vo OT с > Co. 
2. Find, by the method of likelihood ratio testing, a test of Ho : с=со for a 


anormal (m, e) population assuming that m is known. 
3, In Ex. 11 Sec. 14.7 test at 5% significance level if the mean score of the 


population can be regarded as 15.5. 


300 TESTING OF HYPOTHESES I [16.9 


4. The percentage of carbon content of a certain variety of steel has a standard 
specification of .05. For 12 samples of this steel, the percentages of carbon content 
were found to have an average .0483 and standard deviation .00117. Do these data 


reasonably conform to the standard specification ? (Assume that the population 
of percentages of carbon content is normal.) 


5. А drug is given to 10 Patients, and the increments in their blood pressure 
Were recorded to be 3, 6, —2, 4, —3, 4, 6,0, 0,2. Is it reasonable to believe that 
the drug has no effect on change of blood pressure ? Test at 595 significance level, 
assuming the population to be normal. 


8. In Ex. 13 Sec. 14.7 are there sufficient reasons to believe that the students 


of the given college are on the average less than 65 inches tall ? 


7. If the standard deviation of the sample cited in Ex. 11 Sec. 14.7 is 5.8, 
fy the information that the population standard deviation is 5.2. 


8. 11 measured values of a physical quantity have a standard deviation 0.14. 

Is the suspicion that the standard deviation of the population of measured values 

(which is an inverse measure of precision of the measuring process) is greater than 
:O.1 true? Assume the population to be normal, and use 5% level of significance. 


9. Independent samples of sizes 30 and 55 from two normal popul 
having a common variance 17.6 were found to h 
tively. Test at 195 
mean. 


veri 


ations 
ave means 23.0 and 21.9 геѕргс- 
significance level whether the populations also have the same 


10. A sample of size 10 is drawn from each of two normal populations having 
the same variance which is unknown. If the mean and variance of the sample 
from the first population are 7 and 26 and those ofthe sample from the second 


population are 4 and 10, test at 5% significance level if the two populations have 
the same mean. 


11. The IQ's of persons, chosen at random from each of two groups, tested by 
Terman Merrill (M-form) were as follows : 


т 
Grow |16 | 121 | 125 | 125 | 127 | 128 | 131 | 132! 135 | 137 


Group Ш | 109 | 110 | 112 | 114 | 115 119 | 


122 | 123 | 125 131 


On the basis of these data can we reasonably believe that the persons of Group I 
have in general greater IQ than those of Group IL? Assume that the popula- 
tions of scores are normal and that they have a common variance. Confirm the 
latter hypothesis by means of an F-test. 


12. 


For testing the effects of two types of fertilisers on the yield of wheat,. 
15 experi 


mental plots of ground were available. Wheat was grown in these plots, 


16.9] EXERCISES 301 


8 of which were treated with Fertiliser Iand the remaining 7 with Fertiliser II, 
and the yields in kg. are given by the following table : 


А 
Fertiliser I | 38.8 | 39.4 | 41.5 | 41.8 | 44.3 | 448 | 46.2 | 48.0 


Fertiliser Ш | 39.1 40.2 | 40.8 | 42.1 | 42.6 | 44.5 | 448 | — 


Do the two fertilisers really differ in their effects? Assume that the two 
populations of yields of wheat, which can be taken to be normal, have the same 
variance, Also make a significance test of this hypothesis. 


13. The lengths of life of 25 electric bulbs of one kind and 15 of another kind 
were found to have standard deviations 259 and 115 hours respectively. Test at 
1% level of significance if the former kind of bulbs have less uniform quality than 
the latter, assuming that the populations in question are normal. 


14. The correlation coefficient of a sample of size 5,000 from a bivariate 
normal population is —.038. Test the hypothesis p=0 against the alternative 
p < 0, where p denotes the correlation coefficient of the population. 


15. In Ex. 2 Sec. 15.5 test if the bivariate population of theoretical and practi- 
cal marks, which may be assumed to be normal, is at all correlated. 


CHAPTER 17 
TESTING OF HYPOTHESES II 


In this chapter we shall discuss some approximate tests, including 
what are known as tests for goodness of fit, by the method of likeli- 
hood ratio testing. Let us begin with the binomial population. 

17.1 BINOMIAL (n, p) POPULATION 
Let the parameter п be known and large and the null hypothesis to be 


tested be H, : p=p,. Let v denote an observed value of the binomial 
variate X, i.e. a sample of size 1 from the binomial population. The 
likelihood function is given by 
L(v; p)- e | pa-p 
By Sec. 14.2, pe v". Hence 
E ny five pyr 
Lo sp)=(*) (z) (i-i) 
The likelihood function under Hs 
Lo) = Le 5 ро) - (7) n = po)" 


Hence 
у =p ES —nty 
ĝm Эчү = (=) (t эт) 


If H, is true, v/n -po is a small quantity for large n, and we have 


y-npoV"[, v—-npo|"*" zie 
ч (1+ in ( ngo ) [49 71-p 
or 
= ws У=про 2 Ue 
log 4 vlog (1+ TA EC у) log (1 "7m 


= (у - npo) log (1 + 52 *(n-v-ngoj) log (1 - ue 
o 


*np, log (1 + а) eng, log (1 — "Gee 
o o 


17.2] COMPARISON OF BINOMIAL POPULATIONS 303 


Noting that n — у= по = —(v- npo), 


, C 7 npo)* í,CO-np J> v-npo (у= про) 
-log à o. {22 черо, | 
Bc nis nq, So npo 2n*po* 
=, v- npo , (у= про)? 
nl nq, t 2n*g,* ] 
_(=MPo)* 
2np,qo 
Or 
алова 2 bR „л 
x продо * 
where 
и= "Po (17.1.1) 
NMP oo 
-MPo. 


When n is large, ire = is approximately normal (0, 1) under 


NP oo 
Н о, so that и can be taken to be the required statistic. We note that 
as 2 ranges from 1 to 0, — 2 log 2 ranges from 0 to =, and hence the 
critical region 0 < 2 < 2, for л changes to |u| >we for u where 
P(|U|>u)=e 
or 
P(U > te) =} (17.1.2) 
Remark. The approximation of – 2108 4 under the hypothesis 
H, introduces a little crudeness in the logic of the process. This, in 
fact, increases the probability of Type II error, i.e. weakens the power 
of the test. 


17.2 COMPARISON OF BINOMIAL POPULATIONS 

Consider two binomial populations (л, p,) and (n, p.), of which the 
parameters n, and п, are known and large, and let у, and v, be 
independent samples of unit size respectively from them. Our 
problem is to construct a test for the hypothesis Ho :p;-ps. The 
joint likelihood function of two samples is given by 


L(Y; Уа; Di» Ps) 
= (2) (0 з) P. ps U эд” d - Ра) 


fla-ra 


304 TESTING OF HYPOTHESES II [17.2 
Easily we get Da =v,/n,, Da =y,/n,. So 


L(v,; Ya i Dis Pa) 


(eel Ue) ts" en" 
Under Ho : p,= Pa = p (say) the likelihood function becomes 


teni pers open 
Which gives 


^ Ya Vs 
D, Hg 


(17.2.1) 


Gos: Va. ?)= Css vitra (1 — pyniethaoriu os 
Hence 
a=( Vy | eye ay 1 — у, /n,|-natras 
NE A 1 pe ( = =) 
пур ләр p P 
Since n,, n, are large, under Ho, p, ~ p and Pa =p, i.e. у, п, -p 


and va/n, – р are small quantities, and, as in the last section, we shall 
obtain 


~  mpy Loa n,p)* A д 
-2log 2 map)? | (va — nap) fios 
nypq n,pq 


ИМЕ. c = É x: уз и 
pq(n,*n)Ut п» 
where 


un [a Ej Ж, palt) (17.2.2) 


Let v, and v, be the observed values of the binomial variates X, 
and X, respectively. If Hao is true, X, and X, are approximately 
normal (лур, vn,pq) and (n Р, An, pq) respectively for large ny, па 
уз and v, are obtained by independent observations, 


(q=1-p). Since 
Xi, X, are independent, and hence Xı/n, - Xs/ng is approximately 


173] . POISSON-« POPULATION 305 


normal (0: wn il) ра (+ 2)) Now the parameter р which is unknown 


may be Toplaced as an approximation by its maximum likelihood 


estimate p given by (17.2.1), so that under Ho the sampling distribu- 
tion of u is approximately standard normal. Hence, instead of 4, we 
can choose u as the statistic of the test, the critical region for which 
becomes |u| >и, where 


P(|U|>u)=« " 
Of 


P(U>t) is (17.2.3): 


Example. In reply to the question : **Do you usually enjoy the evening alone” 
96 out of 542 persons from one population and 117 out of 979 from another 
population said “уез”. Do the two populations really have different psycholo- 
gical attitudes towards the given question ? 


For the first population the number of ‘yeses’ in 542 trials is a binomial 
(n,2542, p,) variate, an observed value of which is у, =96, and the same for 
the second population in 979 trials is a binomial (n,=979, p,) variate whose 
observed value is v, — 117. Clearly, the question requires to test the null hypothesis 
Н,:р,=р,. 

By (17.2.2) p=.14004, and hence 7=.85996, so that according to (17.2.3) 
a= 3.10. 

Now P(| U| > 3.10) = .002 which shows that the value of the statistic u is highly 
significant, and the null hypothesis is rejected. 


17.3. POISSON-; POPULATION 
Ho: и=ш 


The likelihood function for a sample of size п, 


Ц зн)" x x. lox 


We know п= x so that 


еп 
L(x; =e x 


Now 


; Sert LL Ho 7 eus 
LG) = L(x 5 uo) =e" 5.1 =) m Xn! 


20 


306 TESTING OF HYPOTHESES II [17.4 


So 
пао (Ж | "5 „ш-н E —uoy ts 
2=е е 1+ 
Uo Ho 


or 


-log à= —n( — uo) + n7 log (1 +21) 
Ho 
= —N(% = uo) + 1(#— uo) log (1 Qm 
о 


+ пио log (1 4s) 
Ho 
For large n, x — шо is a small quantity under H,, and hence 
[JM 2 
—2 log 4 = n(x — uo)? =u? 
шо 


where 


Im JZ E-r) (17.3.1) 


is the required statistic, the sampling distribution of which is approxi- 
mately normal (0, 1) under H с. As before, here also we get a two- 
tailed standard normal test. 


Remark. We note here that the sampling distribution of — 21og 4 
under H, is approximately chi-square with 1 degree of freedom for 
large n. This is, however, a particular case of a general theorem 
Which states that if the population distribution function statisfies 
certain regularity conditions, the sampling distribution of —2log 4 


under the null hypothesis is approximately chi-square for large 
samples. 


17.4 MULTINOMIAL DISTRIBUTION 
The probability distribution corresponding to the multinomial law is 


EXy.-n (17.4.1) 


17.5] MULTINOMIAL POPULATION 307 


The joint distribution of the variables X,, Xas... Xm, ie. the distribu- 
tion of the variable (X,, Xo,...Xm) subject to the constraint (17.4.1) is 
called the multinomial distribution. The spectrum of the multinomial 
distribution then consists of the points (i4, 75...im) where i4, is, ...im= 
0, 1, 2,...2 such that xi; =п, and the probability masses are given by 


Јаз = P(X mis, Xo =1o5...Xm=in) 4 
apa”. рт" (17.4.2) 


which follows immediately from (4.6.3). 

1. The multinomial distribution is (m-1)-dimensional, the 
spectrum being confined to the (m-—1)-dimensional hyperplane 
XxXxy-n in the m-dimensional (xi, Xe;...Xm)-space. For m=2, we 
get the binomial distribution which is one-dimensional. 

2. The multinomial distribution has m parameters p, р,,...... Pm 
(apart from n) which are subject to X p,=1. 


8. X; is binomial (л, р) (k=1, 2,...m). 


Example. For п independent throws with a die, the joint distribution of the 
frequencies of the six different faces is a multinomial distribution with parameters 


рь= 116 (k 1, 2,...6) and n. 
We now state (without proof) an important theorem which is 
equivalent to a generalisation of the DeMoivre-Laplace Limit Theorem. 


Theorem. Ifn— e (ps Pss...Pm being kept fixed), then the dis- 
tribution function of 


D npo (17.4.3) 


tends to the x2-distribution function with v =m – 1 degrees of freedom. 


17.5 MULTINOMIAL POPULATION 

Returning to statistical terminology, we now consider the population 
of the multinomial variate (X,, X«,...X,). When the experiment E 
is repeated п times under uniform conditions (ie. the compound 
experiment En is performed once), let the counted frequency of U; be 
уь (k= 1, 2,...m) so that (уз, Y5;... T5) is an observed value of the random 


308 TESTING OF HYPOTHESES II [17.5 
variable (Х., Хг... Хы) or, in other words, a sample of unit size from 
the corresponding multinomial population. 

Estimation of parameters. Let us estimate the parameters p;'s 
from this sample of unit size by the method of maximum likelihood, 
assuming z to be known. We have 

TVs Ya. gm $ Pas PaPa) утуру ps" pa... Pa” 
or 

log L= xv; log p;,+terms independent of p;.’s 


Here we shall have to maximise log L subject to the relation Ур = 1, 
and hence 


3 *Edp,=0, — xdp,-0 
Px 


Xe ee 


where 4 is a Lagrange's multiplier. If follows that 


which give 


+1=0 
Pr a 
for all k. Hence 
o IT ee ey 
Di Pe Pm  XPx 
or 
рРь=УукЇп (К= 1, 2,...m) 


which is an expected result. 


Test of hypothesis. JHoip;-—por (k=1, 2,...m) where ру are 


given positive numbers such that ypo,=1, and m (known) is large. 
Now 


Дуу, Vos... 3 Pay Pes Pn) 


and 


—MM Р z PETI 4 
Ye у! POL Роз" "Pow" 


17.5] MULTINOMIAL POPULATION 309 


So 
ist Vo " E Vn jn 
"poi! MlDos прот 
ог 
—log 2= xv; lo (2+) 
рте NP ok 


= Sv- npor) log (1 ete 


Yr — Прот 
t npo log (1 + ҮЛТҮ, || 


Under Ho each v;/n – Por is a small quantity for large 7, and hence 


- 2 9 
-logiz yje npor) + а {® = прок _ (ух = прох) } 
g прок "Рок | про 2n* Dor” 
or since x(v; -npox) ^O 


— 2logà => (у= прот) =x? 


np ok 


where 
gad (у= ПР ox)? 
x ou v (17.5.1) 
By the theorem of the previous section the sampling distribution of 
x? under H, is approximately chi-square with vem-1 degrees of 
freedom for large n. Thus the statistic of the test is x? ; the critical 
region 0 < 4 < łe for à corresponds to right tail x? > Хе? of the x?- 
density curve where : 
P(x? > хе?) =e (17.5.2) 


Remark. The statistic x? defined by (17.5.1) has an important 
significance. Consider the given event space S which contains the m 
points Us, Ues... U,. Now we can conceive of a statistical image 
of the probability distribution in S on the results of n repetitions of E 
by assigning а probability mass 1/n to each observed event point, so 
that, since v; is the frequency of Ur, the latter gets of a share of mass 
уп. On the other hand, in the theoretical distribution U;, carries a 
mass Рок under the hypothesis Ho, and hence any statistic of the form 


а 
хс? (= = Por) where cps are any suitable constants, of which x? 
E Um 


310 TESTING OF HYPOTHESES II [17.6 


defined by (17.5.1) is one, gives a measure of deviation of the empirical 
distribution from the assumed distribution. If H, is true, this 
measure should be small for large n, i.e. a large observed value of 
this measure will be an evidence against Ho. This is in accord with 


the choice of the right-tail of the x*-density curve as the critical region 
for the above test. 


Example. Ina cross-breeding experiment with plants of a certain species, 240 


offspring were classified into 4 classes with respect to the structure of their leaves 
as follows : 


Class | I | п | m | w | Tota 
Frequency | 127 | о | s | a 20 | 


According to Mendel’s theory of heredity, the probabilities of the four classes 
should be in the ratio 9:3:3:1. Are these data consistent with the theory ? 


Here we are concerned with a multinomial Population with 4 Parameters 
Dis Pas Ps; Ps and have to test the hypothesis Ho : py po, (k= 1, 2, 3, 4) where 


Do179116, роз —3/16, Pos =3/16, рь, — 1/16 


Class v про i (›- "T din 
I 127 135 0.474 
п 40 45 0556 — | 
ш 52 45 1.089 
Iv 21 15 2.400 
тош | 240 | 240 4.519 — 


Hence x? =4.519, and for 3 degrees of fredom P(x? > 4.519) —..21 so that the 
value of x? is not significant at all, and we may accept the hypothesis Н,. In other 
words, the above data may be regarded to be consistent with the theory, 


17.6 x*-TEST OF GOODNESS OF FIT 


We are now ina position to construct a test for a hypothesis of the 
type Ho : F(x) - Fo(x), where F(x) isa given distribution function, 
9n the basis of a large sample from the population. In other words, 

ere We want to test if an observed sample fits an assumed population 


distribution, and hence such a test is customarily called a test of 
goodness of fit. Now the following two cases arise, 


17.6] x?-TEST OF GOODNESS OF FIT 311 


Case I. F,(x) is completely specified, i.e. contains no unknown 
parameters. 

We propose to model this test after the x°-test for a multinomial 
population discussed in the preceding section. For this, all that we 
have to do is to approximately reduce the given null hypothesis Ho to 
a simple hypothesis regarding the parameters of some hypothetical 
multinomial population, which is done as follows : 


1. Let X be the parent variable connected with the experiment Æ. 
We divide the spectrum of X into a finite number, say m, of suitable 
groups or classes Cy, Cy,...Cms having no common points (fora 
continuous distribution these will be in the form of intervals and, for 
the discrete case, in the form of groups of points of the spectrum), 
call the event X e Cr, Ur, and set 

py 7 P(Ux) - Р(Х € Cy) (k=1, 2,...т) (17.6.1) 
Now noting that the events U,, Us,...... U,, are mutually exclusive and 
exhaustive, ie. С. +0 +...... +Um=S, we may roughly regard 
Ons U деше ке» U m as the event points of S. 

2. If X; denotes the frequency of U; in п independent repetitions 
of E (k=1, 2,...m), then we know that (X,, Xas... Xm) is a multino- 
mial variate. An actual sample of size n (given by п repetitions of E) 
being drawn from the population of X, we count the number v; of the 
sample values belonging to the class Съ (k=1, 2,...т) and thereby 
obtain a sample of unit size, (v1, Voy.-.¥m) from the multinomial popu- 
lation having parameters ра, Ро»... Deane 


8. Setting 
Por=P(X ғ СН) (kel, 2,...m) (17.6.2) 
which can be exactly calculated from Е „(х), the hypothesis Ho may be 
regarded to be approximately equivalent to the hypothesis Ho : 
Dx 7 Por (k=1, 2,...т), and the latter can now be subjected to a х?- 
test described in Sec. 17.5. 
Case П. F(x) has a known functional form but contains a 
number of unknown parameters 01, Oa»...--- Ө. This is the case which 
we usually encounter in practice, e.g. we wish to test ifa population is 


normal or binomial etc. 


312 -TESTING OF HYPOTHESES II [17.6 


In this case we first replace the parameters 0,, 05,......0; by their 
maximum likelihood estimates бу, 65,...... б: respectively in F(x) so 
that F(x) becomes completely known. It can be proved (the proof 
being omitted) that if F,(x) satisfies certain general conditions, we may 
now employ the same procedure as in Case I with the only modifica- 
tion that the number of degrees of freedom of x? is reduced by К 


corresponding to k parameters estimated from the sample, i.e. 
ysm-k-1. 


The approximation involved in the x?-test holds only if n is large. 
For practical problems, it is usually found that the approximation is 
fairly good ifn > 50. The approximation also depends upon the fact 
that each Жу, which is binomial (n, Pox) under Ho, is approximately 
normally distributed. This normal approximation, we remember, 
will not be valid if ру. is very small, and it has been suggested for 
common problems that for the proper validity of the above test each 
expected frequency про should be at least 5. As such if some of 
the expected frequencies are small, the x*-test will be liable to error. 
We can, however, avoid this difficulty by combining together two or 
more adjacent small frequency classes to form a class for which the 
expected frequency is sufficiently large, i.e. exceeds 5, 


Examples 


1, For the data of Ex. 1 Sec. 12.3, can the population of numbers of a-ray 
counts be regarded as having a Poisson distribution ? 

The parameter г of the Poisson distribution is unknown and is replaced by its 
estimate £= 7 for calculating the expected frequencies, The last two entries of the 


given table are combined together, i.e. the Poisson spectrum of all non-negative 
integers is divided into 15 groups given by 


i20; 1, 2 13 and 2 14 
so that the expected frequency npor for each group exceeds 5. 
Thus if 

T 

f-etn 
Por=fr (k=0, 1, 2,...13) 
13 
-T1-X fi (k=14) 


q_e 


17.6] x:-TEST OF GOODNESS OF FIT i 313- 


By the result of Ex. 1 Sec. 12.5, 7=5.88, and the computation 15 carried out 
as follows, 


i Po | m. | ¥ | (npo)? [npo 
0 .002795 | 9.7 | s| 0298 
1 01653 | 56.8 | 59, 0085 
2 Digna] dekov [etr л нб 
3 .094696 $72. эп 0.802 
4 439203 | 4809 | 492 0.256 
5 163702 | 5656 | 528 2.500 
6 tens | 5543 | on 35M | 
1 pue | 466 | | 0004 | 
8 099045 | 3422 331 0.367 | 
9 1064712 | 223.6 | 0.058 
10 .038050 131.5 | 121 0.838 
11 .020340 | 70.3 | 85 3,074 
12 009966 | 44 | 724 |7 MA 
13 1004508 | 15.6 | 22 | =. 2,626 
214 003048 ws j 9 0.214 
Total | 1.000000 3455.1 | 3455 18.811 
| 


rees of freedom of x? is 13- 


nd the number of deg 
ple), for which 


Therefore х? = 18.811, а 
has been estimated from the sam) 


(remembering that the parameter 4 
PXS > 18.811)=.14. This shows that the fit is quite satisfactory. 


1 data given in Ex. 2 Sec. 12.3 be regarded as a sample 


2. Can the rainfal 
from a normal population ? 

The parameters m and o of th 
by their estimates m= and o=S respectively, SO 
becomes completely known. The general procedure will be as follows. 

The spectrum (— e; co) of the normal distribution is divided into m suitable 
intervals ак-1 < X <a, (k=1, 2,...m) which are our required classes. Obviously, 
the first class limit а= -° and the last a4— о. Also it is necessary that the 
expected frequency of each of these classes is at least 5. Hence 


е assumed normal population have to be replaced 
that the population distribution 


312 А TESTING OF HYPOTHESES II [ 17.6 


In this case we first replace the parameters 6,, 0,...... б. by their 
maximum likelihood estimates б, ЗИ 6; respectively in Р(х) so 
that F (x) becomes completely known. It can be proved (the proof 
being omitted) that if Fo(x) satisfies certain general conditions, we may 
now employ the same procedure as in Case I with the only modifica- 
tion that the number of degrees of freedom of x? is reduced by k 


Corresponding to k parameters estimated from the sample, i.e. 
y2sm-k-]. 


The approximation involved in the x?-test holds only if n is large. 
For practical problems, it is usually found that the approximation is 
fairly good ifn — 50. The approximation also depends upon the fact 
that each X}, which is binomial (n, ро) under Но, is approximately 
normally distributed, This normal approximation, we remember, 
Will not be valid if Por is very small, and it has been Suggested for 
common problems that for the proper validity of the above test each 
expected frequency npor should be at least 5. As such if some of 
the expected frequencies are small, the x?-test will be liable to error. 
We can, however, avoid this difficulty by combining together two or 
more adjacent small frequency classes to form a class for which the 
expected frequency is sufficiently large, i.e. exceeds 5. 


Examples 

1. For the data of Ex. 1 Sec. 12.3, can the population of numbers of a-ray 
counts be regarded as having a Poisson distribution ? 

The parameter и of the Poisson distribution is unknown and is replaced by its 


estimate Вет for calculating the expzcted frequencies. The last two entries of the 
given table are combined together, i.e. the Poisson spectrum of all non-negative 
integers is divided into 15 groups given by 


i=0, 1, 2,......13 and 14 
so that the expected frequency npor for each group exceeds 5. 


Thus if 
ae 
fines 
eh. (k=0, 1, 2,...13) 
13 
=1-У f (k=14) 


i=0 


17.6] x?-TEST OF GOODNESS OF FIT 313- 


By the result of Ex. 1 Sec. 12.5, ¥=5.88, and the computation is carried out 
as follows. 


i Po | npo | v | (@—прь)?[прь 
0 002795 | 97 | в | 0298 | 
1 016433 | 568 | 59 | 0085 
2 018314 | 1669 | 177 | 0.611 
3 4.094696 | 327.2 | Зи | 0.802 
4 139203 4309 | 492 0.256 
5 163702. | 565.6 528 | 2.500 
6 daps- | 5543 |= ew | 3534 
7 ли | 4656 | 460 | 004 | 
8 .099045 | 342.2 | з | 0.367 
9 064712 | 223.6 | 220 0.058 
10 .038050 131.5 | 121 0.838 
и 020340 | 70.3 | 8 3.074 
12 .009966 34.4 | 24 3.144 
13 .004508. | 15.6 | 22 2.626 
214 .003048 10.5 9 0.214 
Total 1.000000 4455.1 | 3455 18.811 
f x2 is 13- 


18,811, and the number of degrees of freedom О; 


Therefore x? 
аз been estimated from the sample), 


(remembering that the parameter 4 h 
Р(х? > 18.811) =.14. This shows that the fit is quite satisfactory. 
x. 2 Sec. 12.3 be regarded as a sample 


for which 


2. Can the rainfall data given in E: 
from a normal population ? 

The parameters m and с of the assumed пог 
= S respectively, so that the population distribution 
becomes completely known. The general procedure will be as follows. 

The spectrum (— e; ос) of the normal distribution is divided into m suitable 
intervals ar-ı < X € &* (k=1, 2,...т) which are our required classes. Obviously, 
the first class limit ao -° and the last a=. Also it is necessary that the 
expected frequency of each of these classes is at least 5. Hence 


mal population have to be replaced 


by their estimates m=F and 


314 TESTING OF HYPOTHESES II [17.7 


Po. P(ai-, < X < aj) 
where X denotes the parent random variable. Now the variable Z- (X —3)/5 is 
standard normal, and the corresponding standardised class limits z, are given by 
d (k=0, 1, 2,...m) 
whence 
Pos P(z-,«Zez)-21()-4(z.-i) 
where 4(z) is the standard normal distribution function. 


Inthe present example, we combine the first 4 and the last 5 classes given in 
the original table, i.e. our classes are taken to be 


(7 ә, 15), (15, 17),...... (25, 27), (27, со) 


50 that all tho expected frequencies are greater than 5. From Ex, 2 Sec. 12.5, 
#=21,157, 5=4,880, ‘Tho computation is shown below. 


а z a(z) Po про У (v-npo)'[npo 
=% =œ 0.0000 
15 -1.26 0.1038 0.1038 8.61 п 0.663 
17 — 0,85 0.1977 0.0939 7.79 5 0.999 
19 = 0.44 0.3300 0.1323 10.98 9 0.357 
21 —0.03 0.4880 0.1580 13.11 9 1.288 
23 0.38 0.6480 0,1600 13.28 21 4.488 
25 0.79 0.7852 0.1372 11.39 13 0.228 
27 1.20 0.8849 0.0997 8.28 6 0.628 
co co 1.00C0 0.1151 9.56 9 0.033 
Total — — 1.0000 83.00 83 8.684 


Hence х? = 8.684, Since two parameters have been replaced by their estimates 
and there are 8 classes, x? has 5 degrees of freedom, corresponding to which 
P(x? > 8.684)-.13. This shows that the hypothesis of normal population may be 
reasonably accepted. 


17.] EXERCISES 


1. 216 sixes were obtained in 1,000 throws with а die. Is the die honest ? 


?. Let A be an event connected with a random experiment E. If in 192 
repetitions of E under identical conditions A occurs 61 times, can we reasonably 
conclude that the probability of A is 1 ? Use 5% level of significance. 


3. In Ex, 14 Sec. 14.7 is the belief that more than hilf of the electorate will 
-yote in favour of the candidate reasonable ? 


17.7] EXERCISES 315 


4, Of 400 mangoes selected at random from a large stock, 53 were found to 
be bad. Test at 1% significance level the hypothesis that on the average 10% of 
the mangoes were bad. 

5, In 80 tosses with one coin heads were obtained 27 times, and in 96 tosses 
with another coin heads were obtained 31 times. Show that both the coins may 
be regarded as biased. Are the coins equally biased ? 

6. In random samples of 374 and 210 persons from the adult populations of 
two large cities 72.4% and 88.1% were respectively found to be literate. Do the 
two populations really differ in their percentages of literacy ? 

т. In Ех. 4 бес, 12.6 test if the mean number of daily telephone calls may be 


taken to be 8, assuming that the corresponding population has a Poisson distri- 


bution. 
8. The number of daily accidents on a particular road was observed for 156 


days, and the mean was found to be 0,165. Ts the observed data consistent with 
the hypothesis that the average frequency of accidents is onco in every 5 days? 


(Assume that the population in question is Poissonian.) 
9. A die was thrown 1,000 times, and the frequencies of the difforen 
wero observed to be the following : 


t faces 


Face тя aye [*[*[е[ mm] 


105 | 143 | 181 | 157 | 198 | 216 | 1000 | 


Frequency 


"Test if the dio is honest. 
Of 160 offspring of a certain cross between guinea pigs, 102 were found to 
"be red, 24 black and 34 white. According to a genetic model the probabilities of 
red, black and white are 9116, 3/16 and 1/4 respectively. Test at 5% significance 
level if the data are consistent with the model. 

11. A random experiment has three outcomes—A, B, C which are exhaustive 
and mutually exclusive. The experiment was repeated 500 times, in which А, B, C 
respectively occurred 54, 258, 188 times. Test the hypothesis that the probabilities 
of A, B, C are in the ratio 1:5 :4. 

12. For the data given in Ex. 3 Sec. 
of numbers of sixes is binomal (5, p) where (a) p= 


rrectness of the die) and (b) p is unspecified. 
to the data of Ex. 4 Sec. 12.6 is 


10. 


12.6 test the hypotheses that the population 
1/6 (which serves as a test for 


the co: 
13. Test if the population corresponding 

Poissonian. 
14. Test i 

normal. 


f the population corresponding to the data of Ex. 5 Sec. 12.6 is 


CHAPTER 18 
THEORY OF ERRORS 


18.1 INTRODUCTION 


The theory of errors was developed by Gauss, Laplace and others in 
the early 19th century long before modern statistics came into being. 
It was subsequently found that in the theory of errors we are 
simply concerned with a particular class of normal populations, and 
as such it can be treated by the more perfect concepts and termino- 
logy of present-day statistics. Accordingly, we shall here present 
the theory of errors from the points of view already developed in the 
preceding chapters and thus be able to deduce the results of the old. 
theory with greater ease and clearer reasoning. We have, in fact, 
already discussed various problems connected with the normal popula- 
tion, and our main task will perhaps be recounting the same in the 
present context. 


Consider the measurement of a physical quantity by means of an. 
experimental process which may be more or less elaborate. Now it 
is a matter of common experience with an experimenter that if repeated 
Observations are taken under conditions as uniform as possiblé, the 
measured values do not all coincide but instead fluctuate at random, 
the fluctuations being big or small depending on the accuracy of the 
measuring process. Clearly then, the measured values also differ at 
random from the true value (unknown) of the quantity, thereby 
committing errors which are called experimental errors or errors of. 
observation. These errors are uncontrollable and random in nature, 
and hence are also called random errors or accidental errors.* The 


experimental errors are presumably caused by numerous subtle and 


"Errors of another kind known as systematic errors may also be present which 
are more or less constant in nature arising, for example, from faulty calibration of 
the instruments, Personal equation of the observer etc. These can, however, be 
et or minimised to a great extent by care and caution on the part of the: 

Server, but, at any rate, cannot be treated by means of a mathematical theory. 


18.2] THE NORMAL LAW 317 


uncontrollable factors which again very at random from one observa- 
tion to another, e.g. fluctuations of temperature, pressure etc. of the 
surrounding, atmospherical disturbances for astronomical observations, 
undetectable vibrations of the instruments, reading a scale by way 
of eye-estimation and many other known and unknown factors. 


Let X denote the measured value of the physical quantity whose 
hypothetical true value is m. Mathematically speaking, X is a random 
variable, and if we assume for a theoretical model that a measurement 
may yield any real number, X can be taken to be a continuous random 
variable having the spectrum (— œ, œ). The probability density 
function of X should then naturally contain the true value m as a 


parameter. 


The random variable 
E-X-m (18.1.1) 


will be called the error in the measurement. 


Let n measurements of the quantity, performed under uniform 


conditions, give the set of values : X1, Xs,......Xn which is a random 


sample of size п from the population of X. The corresponding errors 


are given by 
ej-x;-m (i=1, 2,...п) (18.1.2) 


so that e, 625... e, is a sample of size п from the population of £. 


18.2 THE NORMAL LAW 

Our first problem will be to determine the probability distribution 
of X, and we are going to show that it can be taken to be a normal 
distribution with mean m, the true value of the quantity and standard 
deviation which gives an inverse measure of precision of the measur- 


ing process. 
ible to require at the outset that the true value 


Jt seems quite plaus 
the distribution of the measured 


m should be the best-fitting point to 
value X in some sense or other. If, in particular, we adopt the 
squares, we shall get in = E(X) (cf. Remark Sec. 8.13). 
ble to take m as the mean of the distribution of X, 


r variable E is zero- 


principle of least 


Thus it is reasona 
and hence the mean of the erro 


318 THEORY OF ERRORS [ 18.2 


HYPOTHESIS OF ELEMENTARY Errors. Since the error is believed 
to be produced by a large number of different and apparently unrelated 
causes, we may reasonably assume that each of these gives rise to an 
elementary error such that the elementary errors are mutually indepen- 
dent random variables, their sum being the total error Е. Now we 
know that the Central Limit Theorem holds for many a sequence of 
random variables, and if the same is assumed hold for the elementary 
errors, then it follows that their sum £ is approximately normally 
distributed. Since the mean of each elementary error is zero, the 
mean of E is also zero so that X — E + т is approximately normal with 
mean m. The other parameter c, the standard deviation of XY, we 
know, gives an inverse measure of concentration of the probability 


mass in the distribution of X about the mean m and thus is an inverse 
measure of precision of measurement. 


Formalderivation. The normal law of error may also be formally 
deduced from some simple starting hypotheses whose plausibility is 
guaranteed by common experience. The deduction will consist of 
two stages. In the first place, we shall show that a set of elementary 
postulates regarding the maximum likelihood estimate of the true 


value, m leads to its unique determination, viz. m= T, the sample 
mean. Then, on the basis of this, we shall prove the normal law 
following a method originally due to Gauss. 


^ 
POSTULATES FOR m 


1. misa simple function or, precisely, a continuously differenti- 
able function of the sample values x,, х»,...... Xn. 


^ 


2. misa symmetric function of x,, х»,...... She 


3. mis independent of the origin of measurement, i.e. for any 
number h 


n(x, + h,xgh, xh) m(x,, Xo,...Xy) th 
4. m is independent of the unit of measurement, i.e. for any k> 0 
m(kx ,, а Kx) =kin(x; 5 Жаа) 


97 in other words, m is a homogeneous function of the first degree. 


18.2] THE NORMAL LAW 319 


By Postulates 4 and 1 


km(x;, хаух) = (kx, Юа». exa) 


=m4(0, 0,...0) +k zh [| (0 0 — 1) 


(01:23, bkta b kzen) 
Making k>+0 
m(0, 0,...0) - 0 


and hence dividing by k and again making к + 0, we have 


^ 


x ayes _ Гат 
тха) Ха са 0,....0) 


By Postulate 2 the constants [Ho 0,...0) must all be the same, 
О, U,.... 


say,c. Hence 


P 
m(Xs, Xas Xn) = СУХ; 


By Postulate 3 cx(x; +h)=csx;+h or c=1/n, Hence т =, the 
sample mean. 


Proor or THE NoRMAL Law. We shall now make another 
hypothesis, viz. that the probability distribution of the error E is 
independent of the true value m. This hypothesis is also in keeping 
perience, from which it follows that the density function of X 


with ex 
is a function of x — m, i.e. we can write 
f(x ; m)-g(x-m) 
The likelihood function of the sample is then given by 
L-g(x,-m)g(xs — m)... g(xs — т) 


or 
log L= x log (х; -т) 


The likelihood equation is 


00810 ог ~ gizmo 
am g(x;-m) 


320 THEORY OF ERRORS [18.2 


Setting, for convenience, G(x -— т) = g(x-m) the above equation 


g(x-m) 
reduces to 
х G(x;-m)=0 @) 
Since this would lead to the solution т= or 
=(x;-m)=0 (ii) 
we have 


®\{б'(х— m) + 34d(x; - m) - 0 
where 4 is a Lagrange's multiplier. If follows that 
G'(x;—m)42-0 (I 1, 2,...... n) 
or, in fact, 
G'(x-m)4 2-0 
So 
G(x —m) +(x- т) + u-0 
By (i) and (ii) u =0 so that 


[ u, a constant 


gi(x-m) 
g(x- т) 


+(х-т)=0 
Integrating we get 


f(x 3m) = g(x- т) = Ae-7-m? [ A, a constant 
Now we must have 


| fx ;m)ix-1 (iii) 


For the convergence of this infinite integral 4 must be positive, and 
hence writing 4 = 1/5? we have from (iii) A=1/ J2zc. So 
Jx; m)=— = e~z m) 202 


/2 


N ZO 


ie. X is normal (m, c). This proves the normal law of error which 
is also known as the Gaussian law in the theory of errors. 


18.3] SOME DEFINITIONS 321 


18.3 SOME DEFINITIONS 
The theory of errors has a traditional terminology of its own. Thus, 
in the theory of errors the maximum likelihood estimate of the true 
value, m is called the most probable value of the quantity, the standard 
deviation с the root mean square error or simply the mean square error 
of measurement, i.e. of X. 

Modulus of precision. We have seen that o gives an inverse 
measure of precision of measurement. To get a direct measure, set 


h- d. 
Js (18.3.1) 


h will be called the modulus of precision of measurement or of X, 
which provides the required direct measure of precision. The normal 
law then assumes the. form 
h 
e. g-htlz- m) 
fo) - T. ewes (18.3.2) 
Error function. The random variable E is obviously normal (0,0), 


and hence, writing in terms of Л, we have 
he 


PüEI«o9- 25 | ете de= 2| e? dx - o (he) 
0 0 


where 


e. | e dx (18.3.3) 


is called the error function which is also sometimes denoted by ег/(х). 

Tables were prepared for this error function which presumably served 

as a substitute for the now commonly used tables of standard normal 

distribution function. The relation between the two may be easily 
verified to be the following : 

ө(х) -24( J2x) -1 (18.3.4) 

Probable error. If + О denote the quartiles of the error E, then 

Q is called the probable error of X. Q is obtained in terms of c as 


follows. We have 
P(E > Q)=.25 


21 


322 THEORY OF ERRORS [ 18.4 


or 
(ЕЁ > 2) — 25 
с с 


Since E/o is standard normal, we have from Table I Q/c = 0.67, or 
from more accurate tables O/c = 0.6745, i.e. 
Q=0.6745 c (18.3.5) 


The probable error gives another inverse measure of precision of the 
measuring process. 


184 ESTIMATION 

Maximum. likelihood method. We know from Sec. 14.2 that m=, 
=S. We know further that a better estimate of c?, particularly for 
small samples, is 5°, where s2 is the unbiased-.stimate of the popula- 
tion variance; s can also be obtained as a maximum likelihood 
estimate of o if, instead of the parent population, we consider a sample 
of unit size from the population of the statistic s? (or S?) (cf. Remark 1 
Sec. 14.2), and let us write 


с =5 (18.4.1) 


Least square method. Since the measured values are attempted 
approximations to the true value m, we may also employ the principle 
ofleast squares as a method of estimation of m, i.e. we fita point to 
the observed points ху, Х2....-.„Хь bY the least square principle, and 
take the best-fitting point to bean estimate of m. For this we have 
to minimise 

х(х;:-т)* = хе (18.4.2) 
asa function of m (cf. Remark Sec. 8.13). This is the simplest form 
of the principle of least squares which states that the sum of the 
squares of the errors should be a minimum. The normal equation is 

Х(х;-т*)=0 (18.4.3) 

which gives 
т*=% (18.4.4) 
Thus the least square estimate of m coincides with its maximum likeli- 
hood estimate. The method of maximum likelihood, in fact, leads 


f 


18.4] ESTIMATION 323 


to the principle of least squares, for the likelihood function of the 


sample is 
1 


L=(2n)-"2 s" e 9^ 
so that, for fixed о, Lis maximum when x(x,- m)? is minimum, which 
is the principle of least squares. 

Remark. The method of least squares has one advantage over 
that of maximum likelihood, viz. it does not require any knowledge 
regarding the population distribution. If, however, the population 
is assumed to be normal, then the method of maximum likelihood 


ZG;-m? 


implies the principle of least squares, as we have just now seen. 
The residuals of the sample are given by 

ух; тё = х (18.4.5) 

The normal equation (18.4.3) states that 
EXvi-0 (18.4.6) 
Also 
xe-n(x-m) 

So 


1 е. es (n — De; en 
w-6-7 52367753 mg HEC mall 


(212,5 (847) 


This equation in terms of the corresponding random variables will be 


Ex. Es. 4 07 DE: |, fs 

Yr a ке Я (18.4.8) 
Since А, Же»... Xn are mutually independent random variables each 
normal (m, o), Ез, Es--- E, are also mutually independent each 
normal (0, в), and hence V; which is a linear combination of E;, 
E, is normally distributed such that 


E(vi)=0 


and 


= — 2 - 
var was pt oet 1 o? 


ie. each residual V; is normal (o. TE o): 


324 THEORY OF ERRORS [184 


Remark. The residuals V,, V .,,...... V, are not mutually indepen- 
dent but are restricted by хү; = 0. 


Written in terms of the residuals 


ба, ИЗИ (18.4.9) 
п 

ыо» =0. ху? 18.4.10) 

0 - 0.6745 S - 0,6745, / Ё ( 

апа 

ya, E АЛ 
ot=s A25 (18.4.11) 
а ее PUR 412 
Ot - 0.6745 s 0.6745, / 2*6 (18.4.12) 


It is customary to reckon the precision of measurement either by an 
estimate of the mean square error or that of the probable error. The 
above formulas give estimates of the mean square error and the 
probable error for a single measurement ; estimates (18.4.9) and 
(18.4.10) may be used for large samples, but for small samples (18.4.1 1) 
and (18.4.12) are preferable. 


Let us now find the corresponding quantities for the most probable 
value, By Theorem I Sec. 13.4 the most probable value X is normal 
(m; o/ Jn) so that 


о(Х) =о/ Jn (18.4.13) 
Q(X) = 0.6745 o/ Jn (18.4.14) 
and their estimates are given by 
25 S J XV; 
e(X)m = = (18.4.15) 
A S /уу, 2 
Q(X) = 0.6745 n= 0.6745 5 = : (18.4.16) 
N 
and 
> 5 bv," 
1 = = -2t 18.4.17. 
100 Jn NE -1) ( ) 
X)- Bt а Ху? 18.4.18) 
QI(X) = 0,6745 Jn 7 6745 n(n 1) ( 


18.4] ESTIMATION 325 


Confidence interval. In old practice the result for the true value 
used to be presented in the form : 

most probable value + an estimate of its probable error (or some- 
times its mean square error), i.e. using (18.4.18), as 
2 
In 
without, however, any precise indication how to use the estimate of 
the probable error as a correction to x. 

In modern statistics the use of probable error (or the like) has been 
completely and very successfully replaced by the concept of confidence 
interval. The confidence intervals for the parameters of a normal 
population have already been studied in Sec. 14.5, and it will be 
easily recognised that (18.4.19) gives nothing but the 50% values of 
the approximate confidence limits (14.5.5) for the population mean т, 
The exact confidence limits are, however, given by (14.5.3) and (14.5.4). 
A measurement was repeated 5 times with the following results : 

2.03, 2,08, 2.03, 2.01, 2.05 
Find the most probable value of the quantity and the corresponding probable error. 
Find also 95% confidence limits for the true value. 


Set х=2+.01х'. 


Z+ 0.6745 (18.4.19) 


Example, 


x xt a 
2.03 3 9 
2.08 8 64 
2,03 3 9 
2.01 1 1 
2.05 5 25 

Total 20 108 


#'=4.0, a,'=21.6 
S3 — 5.6, S'-24 


Hence 
¥=2.040, 5=.024 


By (18.4.18) 
Q1(X)-.008 


For 95% confidence limits, te is given by P(t > te)=.025 corresponding to 4 
degrees of freedom, whence from Table III te=2.776. Thus 95% confidence limits 


for the true value are 2.040 + .033 =2.007, 2.073. 


326 THEORY OF ERRORS [ 18.5 


185 WEIGHTED MEASUREMENTS 

Weights of measurements. Sometimes it so happens that measure- 
ments of the same quantity or of different quantities are done by 
different processes whose relative accuracies are known. Thus let 
X15 Ху... be the results of n independent measurements of the 
same quantity or of different quantities having moduli of precision 


ОТЕ h, respectively such that 
MEn E i быз» Shp = Ws, Wai: Wa (18.5.1) 
where Ws, Ws,...... wy, are known positive numbers which are called the 


weights of the corresponding measurements or of the corresponding 
random variables X;, X;,......X«. Obviously, the weights are not 
exactly determinate, for all of them can be multiplied by any constant, 
Te df Wu, War sve Wn are the weights of X,, X5,...... Xn, then cw,, 
сИ. cw, may be also taken to be their weights, с being any 
constant. Hence we may set 


h;= Jw; h (18.5.2) 
where h is an unknown constant, or for the standard deviations 

oi=0] Wi (18.5,3) 
where 

o=1//2h (18.5.4) 


Presently we are interested in the repeated measurements of a single 
quantity whose true value is m ; in that case the random variables Xi, 
Adan Xn are mutually independent, X; being normal (m, o/ Jwi) 
(151,2,... 8). 

In order to realise how the weights arise in practical problems, 
we take the following example. Let a quantity be independently 
measured 10 times by the same method corresponding to modulus of 
precision h, and the results be x,, Xq,......X:9. Suppose the 
arithmetic means 


у= 35 + Xe)/2, Ya = (Х» x4 + x5)/3, ys = Qt t ...... t X19)/5 


are calculated, and the results are presented in the condensed form 
Yas Yas Уз 5 then the random variables Y,, Ys, Уз are normal each 
with mean m, the true value but moduli of precision h,= /2h, 


18.5] WEIGHTED MEASUREMENTS 327 


ha= J3h, Һа = J5h respectively so that hy*:hg*: hg? —521:9::5. 
Hence yı, уз, уз may be taken to be the measured values of the given 
quantity corresponding to weights 2, 3, 5 respectively. This example 
illustrates an important method by which weights are determined in 
practice. The weights may, however, be assigned by other methods 
as well in other situations. 

Modified empirical distribution. We note that here the measured 
values X4, Xes. x, do not form a sample from the same population, 
but, in fact, are п independent samples of unit size from п different 
normal populations having the same mean m but different standard 
deviations o; —c/ Jw: (i= 1, 2,...n). In order to represent the distri- 
bution of these measured values taking their weights into consideration, 
we construct a hypothetical discrete probability distribution, in which 
the spectrum consists of the points х;, x,,...... x, and the total mass 1 
is distributed to these points in the proportion of their weights, i.e. 
the mass at x, is wi/ Sw = 1, 2,...... m). This is the modified empirical 
distribution for a set of weighted measurements. Let the characteris- 
tics of this empirical distribution be denoted by the corresponding 
notations for the characteristics of a sample (Sec. 12.4) with an over- 
head bar, so that the mean, variance etc. of this distribution are given 


by 
х= ЭЗ эх (18.5.5) 
which is the weighted arithmetic mean of xi, Xs,...... "s 
p= y p wii- x)? 
etc. where 
W-zW (18.5.6) 


The corresponding random variables are 


X= H > WX; 


(18.5.7) 


328 THEORY OF ERRORS [ 18.5 


etc. Since X; is normal (m, с/ J/w,) and X;'s are mutually independent, 
X, alinear combination of X's, is normal (m, c/ JW), or, in other 
words, we may regard Xas a measured value of the given quantity of 


weight JV. It follows that the statistic U = JW(X-m)/o is standard 
normal. 


We may write 


5 = у DOG ny — (=m? 
. Which gives 
E($3)- (n— 1)o2/W (18.5.8) 
This shows on putting 


vs*=ng? (v=n-1) (18.5.9) 
that W s °/n is an unbiased estimate of o°. 


Also settting 


Ү;= Jw,QX;  m)lo (i 1, 2...) 
we get from the above expression for 5° that 

WS?/o* = xY;? - U4 
where Y;’s are mutually independent standard normal variates and 
U= x /wiY,/ JW is a linear function of these such that the sum of 
the squares of the coefficients is unity, and hence by Theorem III 
Sec. 9.1 X* -W'S?|s? is x?-distributed with у=п-1 degrees of 
freedom, and x? and U or X and S? are independent. 

Now x° and U are independent, the former being x*-distributed 

With у=п — 1 degrees of freedom and the latter standard normal, so that 


by Theorem I Sec. 9.2 the statistic £ = An(X — m)/s has a t-distribution 
with v —n - 1 degrees of freedom. 


Estimation 


Maximum LIKELIHOOD METHOD. Here X; is normal (m, o] wi), 
its density function being 
wla,—m)2 


fo Qni 3 mad) = 240, S05 (i=1, 2,...п) 


w 
„220 


18.5] WEIGHTED MEASUREMENTS 32> 


Hence the joint likelihood function of x,, хз,...... Xn is given by 


1 
7p; Et m? 


L-(22)?!357" JJ w,Ws......Wg € 


We remember that the weights w,, w.,...... w, are known, and m, с 
are the only parameters to be estimated. Hence 


1 жч 
3i > wx; — m)? + const. 
рд 


and the likelihood equations are 


log L= -n log o- 


The first equation gives 
=w,%,-™m)=0 


or 
т=х (18.5.10). 
and the second 
- = + L6 -m)-0 
or ^ — 
o? =WS*/n (18.5.11) 


It follows from (18.5.8) that the estimate 5? is not unbiased, which 


is only expected. 
If о; and О; respectively denote the standard deviation and pro- 
bable error of X;, then 
4 - sl NL „тю; S 


(18.5.12) 
Q,- 0.6745 JW]nw, $ 
Also for the most probable value we have 
200) = s] JW = S] Jn 
(18.5.13) 


Q(X) = 0.6745 S] Jn 
If, however, we consider the likelihood fuuction for a sample 
of unit size from the population of s? (or $*), the maximum likeli- 


330 THEORY OF ERRORS [ 18.5 


hood estimate of о? turns out to be the unbiased estimate Ws?/n 
which will be denoted by сї, i.e. 


сі? = № s*|n (18.5.14) 
and correspondingly we have 


cite JWinw;s, —Qit = 0.6745 „пи; s (18.5.15) 


cl(X)-s] Jn,  Q4(X)=0.6745 3/ Jn (18.5.16) 


LEAST SQUARE METHOD. In this case the principle of least 
Squares will consist in fitting a point to the modified empirical 
distribution of the measured values defined earlier, i.e. minimising 
Xw(x;-m)*/W or 


=w(x;-m)* = xwe;? (18.5.17) 


—the weighted sum of the squares of the errors, This is the 
modified form of the principle of least squares. The normal equation 
on putting m=m* becomes 


Dw,(x;- m*) =0 (18.5.18) 
giving 
m* =x (18.5.19) 


which is the same result as before. Here also the principle of least 
Squares follows as a consequence of the principle of maximum likeli- 
hood; for fixed c, a maximum of the likelihood function L clearly 
corresponds to a minimum of (18.5.17). 


The residuals у; are defined by 


yj-x,-m*ex,-x (18.5.20) 
From (18.5.18) 
swy,=0 (18.5.21) 
ie. the weighted sum of the residuals is zero. 


We have 


$e n (18.5.22) 


18.5] WEIGHTED MEASUREMENTS 331 


and the formulas (18.5.12) - (18.5.16) may be easily written down in 
terms of the residuals. In particular, we have 


дож) = 0.6745 q/ 227 (18.5.23) 
онд - 06145A {у (18.5.24) 


CONFIDENCE INTERVAL. Considering the statistic t= Jn(x- m)/s 
whose sampling distribution is ¢-distributed with v=n—1 degrees of 
a confidence interval for m having confidence coefficient 1-— e 


freedom, 
is easily found to be 
ga ite, 54 He) 
(s Ux 9738 (18.5.25) 
where 
P(t»1)-8 (18.5.26) 
Example. Work out the example of the previous section, assuming that 


the measurements have weights 1, 1, 3, 2, 3 respectively. 


Set x=2+ .01х'. 


x w A wx’ wx"? 
nM 
2.03 1 3 3 9 
2.08 1 8 8 64 
2.03 3 3 9 27 
2.01 2 1 2 2 
2.05 3 5 15 75 
aastst 
Total 10 == 37 177 
х1=37, а,'=17.7 
S” = 4.01, 5'=2.0 
зо that 
х=2.037, S= 020 
By (18.5.24) 


ОНС) = .007 
As before te = 2.776, and hence by (18.5.25) the required confidence limits are 
2.037 + .028=2.009, 2.065. 


332 THEORY OF ERRORS [ 18.6 


18.6 INDIRECT OBSERVATIONS 
Suppose we have to determine the values of a number of physical 
quantities whose ture values аге q,, узуге qx, and the situation is 
such that these quantities cannot be directly measured but instead 
n(> k) other quantities can be measured, whose true values т,, 1712)... 
Ma are known functions of q,, q,,...... qj, and. let X, X5, Xm 
denote their independently measured values. Our problem then will 
be to find estimates of 1 Q35..-... qr. on the basis of the observed 
values x,, x,,......x.. Неге, for simplicity, we shall consider the case 
of linear functions only given by 

т. =4;q, + bigat e +/аь 

Me —asqQ, +5%да+ + ].,@; 

(18.6.1) 

Ma — Andy + Dada + + 

where the coefficients a’s, b’s etc. аге known. 
The errors in the measurements are 


€; c X; - тех; dig, - biga — 0 — Sidr 
(Ex 1,2. an) (18.6.2) 


Equally precise measurements. If с is the standard deviation 
of each measurement, X; is normal (тг, c), and hence the joint likeli- 


hood function of x,, Xay.. X is 


-n|2 = -ia E(z,7 mq. 
L=(2n)"!? ge ?c 

For fixed c, maximising the likelihood function L amounts to 
minimising 

E(x; -m,)? = уе; (18.6.3) 
which is the principle of least squares. This is, in fact, the extended 
form of the principle for combination of measurements of different 
quantities. We note that the unknown parameters in ye,? are 
11> Яа....... and the corresponding normal (or likelihood) equations 
are 


зе? _ We" oss Ore, 


ад, dqa „сез diae =0 


18.6 ] INDIRECT OBSERVATIONS 333 


"The first equation gives 


де; 
есу = T 3467 Xa (ad + biga ++ +Д@к—)=0 
or 
qi Ха; + qs Saib; + oe + qr ха = Xà;x; 


The other equations also reduce to similar forms, and the complete 
set of normal equations becomes on putting q1—4,*, q4— q,*, 
4k = Ф 


qa” Ха + q4* xXajbj eo tqu* Xa,f;- Vax; 
qı” Eba; + qa” ХЫ ee +4ь* ХЫ EHX; 

(18.6.4) 
4,* УЛ + qa” УЛЫ + + xf = уух 


We assume that the coefficients a’s, b’s etc. are so given that the 
normal equations yield unique solutions which provide the required 
least square (or maximum likelihood) estimates of the parameters. 
If necessary, we have 


т#=аду*  bqs* + +Даһ* (9 1, 2,...7) (18.6.5) 


Here also we can find estimates of the standard deviations or 
probable errors of the obtained estimates of the parameters 4's and 
m's as well as confidence intervals for the parameters, but the methods 


for these are rather lengthy and complicated, and we do not propose 
to deal with them. 


Remark. We may also pose the above problem from a slightly 
different point of view. Since > k, the equations (18.6.1) show that 
only k of the n m;'s are independent, and by eliminating q,, q.,..... Ik 
from the set (18.6.1) we can get (n— К) linear relations of constraint 
among Mi, ng... P. Now let m physical quantities have true 
values тү, Poz... ma Which satisfy, say, (n-k) linear constraints 
represented in a parametric form (18.6.1) in terms of k independent 
parameters qi, Joy... qx. Then given a set of independently measured 
values Ху, Xs,......X« of these п quantities, let our problem be one of 
finding estimates of ту, Moy... m, Which satisfy the same relations 
of constraint as 7, ma,......mg,. The method of procedure is same as 


334 THEORY OF ERRORS [ 18.6 


before, and the solution is obtained in the m;*’s which obviously 
satisfy the same constraint relations as m;s. This latter problem 
belongs to what is usually called the Theory of Adjustment. 


Example 1. 4, and q, are the true values of two quantities, and four other 
quantities whose true values m,, т,, m, m, are given by 


m,-q,—-4q, 
m,=q,+2q, 
m, =5q,—3q, 
m,:3q,4-4, 


are measured by independent and equally reliable experiments to be 2.5, 12.7, 19.2, 
24.1 respectively. Find the least square estimates of qı, qa as well as of т,, nts» 


ms, My. 


For writing the normal equations (18.6.4) we prepare the following table : 


a b x a? b^ ab ax bx 

1 -1 2.5 1 1 -1 2.5 -2.5 

1 2 12.7 1 4 2 12.7 25.4 

5 -3 19.2 25 9 -15 96.0 —57.6 

3 1 24.1 9 1 3 72.3 24.1 
a 7 5-75 o o5 n ^.. .——=—ъҗ 
Total eu — 36 15 -11 183.5 -10.6 


The normal equations are 
363,* —11q,* = 183.5 


—119,*-4-159,*— — 10.6 


Solving these we get 
q,*-6.291, q,*-3.907 


Hence by (18.6.5) 
m,*= 2.384, m,*=14.105, m,* 219.734, m,* 222.780 


The final results may be written as 
q,*—6.29, 4,*=3.91 


m,*=2,38, m,* 214.11, m,*—19.73, m,* - 22.78 


18.6] INDIRECT OBSERVATIONS 335 


Weighted measurements. Let X,, Хз,...... Xn have weights w4, 
ОТ. Wy respectively. Then X; is normal (m; c/ /w,), and 


ly 
7354 Pw, Ur; 7 m 


L= (24) "1° o7^ руа... ае ^ 
Thus, for any fixed о, L is maximum when 
XwQ(x;— m)? = xw;e;? (18.6.6) 


is minimum (principle of least squares), and the normal equations 
reduce to 


хиа + ааъ + фа = Хах; 
q,* SW б а t+ qa * Уи, +e + ху, = Ух 

(18.6.7) 
q1* Xwifia; + qa EW: ДЬ; tov tg; IW, Si? -xXwifx 


m,*’s will then be given by (18.6.5) as before. 


Example 2. Find the required estimates in Ex. 1 if the measurements have 
weights 1, 2, 2, 1.5 respectively, 

For convenience, we take the weights to be 2, 4, 4, 3. 
_— ————-—-——--——-+-———-—-—-———— 


а b x w wa* wb? wab wax wbx 
1 -1 25 2 2 2 -2 5.0 =5.0 
1 2 127 4 4 16 8 50.8 101.6 
5 -3 192 4 100 36 -60 384.0 — 230.4 
3 1 241 3 27 3 9 216.9 72.3 

Total — — — 133 57 —45 656.7 -61,5 


The normal equations (18.6.7) are 
133g ,* —45q,*= 656.7 
- 45q,*+57q,*= – 61.5 
which give 
q,*—6.239, q,*—3.847 
m,*—2.392, m,* —13.933, m,* —19.654, m,* — 22,564 


336 THEORY OF ERRORS [ 18.7 


Thus the final results are 
q,*-6.24, 4,*=3.85 
m,*=2,39, m,* 213.93, m,* = 19.65, m,*—22.56 


18.7 EXERCISES 


1. A board, with a pair of rectangular axes marked on it, is placed on a 
horizontal table, and a shot is dropped upon it from a height being aimed at tho 
origin. If (X, Y) denotes the random point of meeting of the shot with the board, 
then assuming that X and Y are independent and that the bivariate probability 


distribution of (X, Y) is symmetrical about the origin, prove that each of X, Y is 
normally distributed with zero mean. 


2. If the random variables Х,, Х,,......Х, denote measured values of n 
quantities whose true values are т,, m, ,...... m, with moduli of precision Л, /1,,..- 
«A, respectively, then show that the linear combination Ж=а,Х,+а„Х,+ т + 
4,X, is a measured value of a quantity whose true value is m with modulus of 
precison A given by 


т=а,т,+а„т,+...... +а,т, 
76 a,’ а? 
de e ына НрДЯ 
Deduce that if w,,w,,......w, are the weights of X,, AX.,.......X respectively, the 


weight w of X is given by 


3. Xis Xass Xn are independent and equally precise measurements of a 
physical quantity, Show that the modulus of precision of any linear combination 
C,X, C,X4 H СХ, Where 6,40, 4e +c,=1, is maximum when the linear 


combination happens to be the arithmetic mean of the measurements. 


4. IE x,, X,,......x, are the results of n independent measurements of a single 
quantity having weights w,, w,,...... w, respectively, then show that the weight of 
the ith residual is w; W,(W--w,) where И= xw,. 

5. The following are the results of 10 of the many observations of the outer 
diameter of Saturn's ring by Bessel with the heliometer at the Keonigsberg Observa- 


tory, the measured values being reduced to the mean distance of the Saturn 
from the Sun : 


38".91, 39.32, 387.93, 39".31, 39".17 
39".04, 39".57, 39".46, 39".30, 39".03 


Assuming that the observations were made under similar conditions, find the 
most probable value of the quantity, estimates of the mean square error and 


Probable error of a single measurement and those of the most probable value. 
Find also 50% and 95% confidence limits for the true value. 


18.7] EXERCISES 337 


6. A physical quantity was measured by 6 different methods with the results : 
0.690, 0.681, 0.673, 0.687, 0.677, 0.675, the weights of the measurements being 2, 1, 
1, 1, 3, 4 respectively. Compute the most probable value of the quantity, estimates 
of the probable errors of the individual measurements and of the most probable 
value and 95% confidence limits for the true value. 

7. The measured values of 4 quantities, whose true values m,, m,, m,, m, аге 

igiven by 

m,-4,- 4.24; 

m,-4,249,*24, 

m, =2q, +543 —4, 

m,-39,-4,4 74, 
where q,, 4a, 9 are the true values of three other quantities, are 4, 15, 26, 10 res- 
pectively. Find the best values of g,, qa, q, and т,, m,, m,, m, according to the 
principle of least squares, if the measurements are (a) equally precise and (b) have 
weights 1, 1, 2, 4 respectively. 


22 


TABLES 


If F(x) denotes the distribution function of a random variable X, 


we define F(x) by 


Е(х)=1- F(x) - P(X > x) 
which, for a continuous distribution, represents the area of the tail 


ofthe density curve to the right of 
the point x. 


Table І. Standard normal dis- 
tribution 

Here (x) is tabulated as a 
function of x, where q(x) denotes, 
in special, the standard normal 
distribution function. (Fig. 39) 


Table It. x?-distribution 

The table gives the points x? for 
different values of F and the number 
of degrees freedom n. (Fig. 40) 


Table Ш. ¢-distribution 


The table gives the points ¢ for 
different values of F and the 
number of degrees of freedom л. 


(Fig. 41) 


Table IV. F-distribution 

In Table IV, F —.05, i.e. the 5% 
F-points are tabulated for different 
values of the parameters m and n. 


(Fig. 42) 


Table V. F-distribution 


In Table V, F=.01, iie. the 1% 
F-points are tabulated for different 


values of the parameters m and m. 
Fig. 43) 


Fig. 39 


o 
2S 


Fig. 40 


o t 
Fig. 41 


Fig. 42 


0 Р 


Fig. 43 


TABLES | 339 


Table I. Standard Normal Distribution 


] 00 ‘Ol “02 "03 04 05 "06 07 08 09 


ооооо 
ошо © 


NNDDD NNYNON тыын mrima Oooooo 
BUNKS sou косо ДАФ 


Фоча 


ошоо 
PWNS 


"5000 4960 `4920 ‘4880 ‘4841 ‘4801 4761 4721 -4681 -4641 
4602 74562 74522 74483 4443 4404 4364 4325 4286 '4247 
"4207 4158, 4129 *4091 4052 4013 3974 3936 3897 :3859 
"3821 :3783 :3745 :3707 3669 3632 3594 :3557 3520 :3483 
73446 "3409 :3372 :3336 3300 :3264 3228 :3192 +3156 3121 


:3085 :3050 :3015 :2981 2946 72912 72877 2843 2810 72776 
12743 72709 :2676 72644 '2611 2579 °2546 2514 "2483 2451 
2420 :2389 :2358 :2327 2297 '2266 2236 2207 72177 2148 
‘2119 :2090 72061 :2033 72005 ‘1977 1949 1922 -1894 :1867 
1841 71814 71788 :1762 1736 “1711 "1685 1660 1635 1611 


"1587 71563 “1539 -1515 "1492 "1469 1446 1423 1401 1379 
1357 71335 -1314 :1292 1271 1251 1230 -1210 -1190 1170 
11151 “1131 "1112 71094 1075 -1057 1038 1020 1003 '0985 
0968 70951 :0934 -:0918 ‘0901 ‘0885 0869 0853 ‘0838 '0823 
‘0808 :0793 "0778 `0764 ‘0749 0735 ‘0721 ‘0708 0694 0681 


0668 ‘0655 0643 70630 0618 0606 ‘0594 0582 -0571 `0$59 
:0548 -0537 -0526 `0516 0505 0495 -0485 -0475 0465 -0455 
"0446 70436 70427 ‘0418 0409 0401 "0392 0384 0375 -0367 
70359 0351 ‘0344 "0336 "0329 0322 0314 -0307 -0301 0294 
70287 "0281 0274 "0268 0262 0256 0250 0244 -0239 0233 


‘0228 0222 ‘0217 ‘0212 0207 0202 0197 ‘0192 +0188 018% 
70179 70174 0170 0166 ‘0162 0158 ‘0154 ‘0150 0146 -0143 
"0139 0136 "0132 0129 0125 ‘0122 0119 0116 ‘0113 0110 
‘0107 0104 ‘9102 0099 0096 0094 "0091 0089 0087 0084 
70082 "0080 ‘0078 ‘0075 0073 ‘0071 0069 0068 0066 "0064 


10062 0060 :0059 0057 0055 0054 0052 0051 0049 0048 
:0047 0045 0044 0043 0041 0040 0039 0038 0037 -0036 
10035 0034 0033 0032 0031 0030 0029 0028 +0027 -0026 
:0026 0025 -0024 -0023 0023 0022 0021 0021 0020 0019 
"0019 0018 0018 “0017 0016 0016 0015 0015 0014 0014 


0013 0013 0013 0012 0012 0011 0011 0011 0010 0010 
10010 0009 0009 -0009 0008 0008 0008 0008 0007 0000 
:0007 :0007 0006 0006 0006 0006 “0006 -0005 ~0005 -0005 
10005 :0005 0005 -0004 0004 0004 0004 0004 0004 10005 
0003 0003 "0003 0003 0003 0003 0003 +0003 0003 -0002 


The above table is abridged from Table 111 of Fisher & Yates: 
“Statistical Tablés for Biological, Agricultural and Medical Research” 
published by Oliver & Boyd Ltd., Edinburgh, and by permission of the 
authors and publishers, 


TABLES 


L69-LE SLS.0€ 652.82 966-7 
Є@1.9Е IPL 6c EL8-9Z 589.2 
8ZS-PE 889.10 01-50 TIETE 
606.2E LIZ.9Z VSO-vC 920-17 
HITTE STLYT 819-22 519.61 


886.67 607-Є© 191-1© LOE-8T 
LL8.LC 999.17 619.61 616.91 
SZI.9Z 060.02 891.81 LOS-SE 
25.00. SLHBI 9.91 L90.vT 
15720 C18.91 ©©0.51 Z6S-21 


SIS.OT 980-ST 88Є.Є1 0/0.11 
19%.81 112.1 899.11 88.6 
992.91 ЅРЕ.11 LE8-6 SI8.L 


SI8-€1 012.6 8.1 166.5 
178.01 $9. 9175 198.6 
ТИПСКИ a ee 
Too. 10. 20. $0 


LOE-ZZ 118.61 20.11 бЕЕЎЇ 


P90-1Z 151-81 200-91 6tttI 
718.61 $869] 611.61 Ove Zl 
68.81 218.1 110.71 OPE-TI 
SLC.Ll. 19-91 668-71 Ive OT 
186.51 СРР:ЕТ ISLIE cbt.6 
PS9.pI СРС.СІ 959.01 ЕРЕ.8 
TIECT 080.11 #66 YHEL 
LIOZL €08-6 £868 9PE-9 
$9.01 855.8 TETL 8.5 
967-6 68.1 $909 18: 
6LLL 686-5 8/8. Е.Е 
16069 ZP9-b $99& 99ET 
€09.p GITE 8002 98tI 
900.2 9:1 0) 550. 


ш 
178.01 
976-6 
v£0.6 
8yi.8 


L9c.L 
£6t.9 
175.5 
19. 
828. 


000. 
6с 
vcv.l 
ш 
ЗР. 


LOE-O1 15.8 19.2 
L9v.6 062.1 105.9 
2698 270.1 268:5 
108.1. 069 90.5 
686.9 8155 SiS v 
6LI9 598. OF6-E 
O8E-S 8910 STEE 
+65. 06ў& EELT 
CE #80 1917 
CLO.€ #02 59-1 
єє 019.1 SPIT 
6Р9.1 #901 TIL. 
$00.1 #85. С: 
9. 16 01. 
$90. 910. #00. 
08. 06. <6. 


uonnqgisiq-;* “IT 91491, 


341 


TABLES 


2 
pue some əy} јо чозгшләй 49 pue ‘yZinquipg “PYT phog 29 194119 Aq рәц=цапӣ „YILI [eorpojA 
-[no119y “eoBoporg ло} 59199, Jeonsyeig,, : S2)€A 29 29814 JO AJ AQEL шол рә8риде st o[qe1 240q* au T, 


Є0/.6$ 268.05 296.17 ЄЛЄЎ 
ZOE-8S S8S.6h £69.9y LSS.CV 
£68.96 817.8р GIv-Sv /ЕЄ.ЇЎ 
914.55 £96.97 OPI-bY ELL-Ob 
250.75 79S 968.Cv ©88.8Є 


0@9.@$ ў1Є.ўў 99$.1ў TS9.LE 
611.15 086.©ў 010.07 $1ў.9Є 
SZL.6y 8©9.1ў 896-8€ CLI.SE 
590.8 682.0 6S9-LE HZ6-E€ 
161.9ў TEG.BE EPE-IE IL9.ZE 


SIS-Sb 99S.LE OZO-SE OIvT-IE 
0@8.Єў IGL-9E L89.€€ PHI-OF 
@1©.@ў SOS-v£ OPE-ZE 698.82 
06/-0ў 60}.ЕЄ S66-0€ L8S.L7 


9S7.0b 
L80.6€ 
916-LE 
Тр2.96 
£9S-SE 


C8E-VE 
961: 
100.2 
£18.0£ 
519.62 


10:82 
vOC.LC 
686.©© 
69L-vC 


250.68 000 СЕ ££9.62 962.92 THS-ET 


082.9 
6EL-SE 
LCO-vE 
С16.СЕ 
561.16 


519.0Е 
65.62 
бср 87 
ТОЕ. 
1.92 


30.52 
006.2 
091.02 
519.12 
97.0 


0Е8.ЕЄ 
197.СЕ 
16Е.ТЄ 
61€.0€ 
9vc.6C 


“л.8© 


960-LC 
810.97 
6£6.b7 
8S8-£7 


SLL. 
689.12 
109.07 
115.61 
81.81 


9EE-67 
9££.8c 
9£t.Lc 
9£t.9c 
OEE.ST 


[T3374 
16,02 
16627 
186.12 
16.02 


1.61 
8£€.8T 
8t£.L1 
8EE.91 
ЗЕЕ.51 


805.2 
5.7 
179.2 
677 
COLT 


198.02 
£46.61 
120.67 
101.81 
8141 


992.91 
[41241 
Orv-vl 
IES-ET 
v9.75 


v9t.£C 
517.07 
885.17 
£0L.07 


"08.61 


066.81 
790.81 
13111 
1.91 
Sty-ST 


SLS-VT 
SIL-EL 
158.01 
200.1 


665.02 
891.61 
66.81 
РІТ.8Т 
262.11 


19.91 
659.51 
88-71 
Tv0-vt 
[07423 


ЄРР.СІ 
159.11 
98.01 


"580.01 


25111 21.6 


6.81 
80L.LT 
826.91 
181.91 
61.51 


119,91 
8h8-€1 
160.£1 
SEETI 
165.11 


158.01 
LIT-O1 
06t.6 
719.8 
296.1 


90Е.91 
15.61 
ЇЎ8.Ў1 
SCIVIT 
60r.€1 


169.71 
266.1 
£67.11 
009.01 
S16.6 


180.6 
195.8 
906.1 
Sec.L. 
419.9 


£56-p1 
982.91 
95.61 
618.01 
861-21 


vcs.TI 
958.01 
961.01 
tvS.6 
L68-8 


097.8 
££9.L 
S10.L 
807.9 
218.5 


“ләцѕцапд 
pue jeng 


TABLES 


342 


€L05 LOZ 20920 1800 SLE THET #01 998. 169. 95. EGE- 8SZ. 81. j| SI 
0  LLGZ #92  SvLc 10/1 ЕІ 9101 898. 769- LES. €6€- 852. 8cL.| VI 
Icy 1068 059с O92 ILLI 0561 601 0/8. #69. BES. }бЄ 6ST. 8СІ. | ЄТ 
ЄЎ 650.6 189.2 6012 1801 9st. 801 18. S69. 65. 568. 6ST. 8@1. [4! 
у? — 901-£ SILT Occ 9601 981 8801 918.  L69. 0. 96€. 09. 6@1.| H 


18$ў 698 POLT 80020 181 ШЕТ. £601 68. OOL 205. L6E- 092. 
1800 OSTE 1287 TOTT 68.1 EBET — OOL.I +88. OL. Єў 86t. 192. 
THOS SSEE 968-2 908% 0981 681 8011 688. 90/. 9Р5. 66t. TIT. 
800.5  66v€ 866.2 SOET 5681 5071 6111 968. HL. 65. Ob. #92. 
656-5 — LOLt Cpe 0 Є61 ОЙТ PELE 906. 8IL. ESS. РОР. 59С. 


698.9 70-7 S9E-E Lec SIOZ 9/Ӱ.1 951.1 006. LTL. 6SS. 80р. L97. 
019.8 ?09.Р ПЕ 900 TEL EES. 061.1 16. Tel. 695. tlve ILT 
WOOD 185 1606р 8ГЕ ESET 891 OSTE 8/6. SOL. #85. Vcr. LLT 
865.18 526.6 696.9 Е0Е.Ӯ 06:4 9881 98Е.1 190.1 918. 119. Svr. 68С. 
619.969 159.89 148.18 90021 #189 8008 961 9/81 000.1 Lc. 015. Sce 


SI: oc. Sc. 


S0. 0. 


поршпз-2 "ШІ әче, 


343 


TABLES 


( 


рие sione oy} jo uorssruriod Aq pue *udinqurpqd 
-]norSy *jeorgoorg 10у so[qv,], [2011513015 + SIVA 79 JOYSHT JO TIT 219°, Woy PIMP 


162.Е 
ELE.E 
09Р.Е 
155. 


9Р9.Е. 
659.Е 
vL9.£ 
069.€ 
LOL-E 


SCL-E 
SvL.£ 
LOL-E 
TOLE 
618-€ 


OS8.€ 
є88.Є 
[4:323 
$96.£ 
$10-p 


OLSZ 
LI9.C 
099.2 
yoL.c 


OSL.C 
92 
£9L.C 
ILLT 
6LL.C 


L8L.C 
1617 
108. 
618.7 
1£8.7 


Sy8.C 
198.7 
8L8.7 
368.7 
176-7 


9ct.c 
BSET 
06€.2 
ECHT 


LSv.C 
c9v.C 
L9v.C 
ELV. 
6Lv.C 


S8r.c 
©6ў-© 
00S.c 
805.2 
815.2 


87S. 
6£S-7 
255.2 
195.2 
85.4 


096.1 
086.1 
000.2 
170.2 


20.2 
5Р0. 
840.7 
250.2 
950.7 


090.2 
$90.7 
690.7 
vLO-C 
080.7 


980.7 
60-2 
101.2 
оп. 
01. 


59.1 
859.1 
19.1 
789.1 


169.1 
669.1 
102.1 
01.1 
901.1 


802.1 
TILI 
VIL. E 
LILI 
11.1 


STL.T 
601.1 
PELL 
OvL-1 
9-1 


787.1 
687-1 
967.1 
£0£.T 


ore. 
ПЕ 
£I€.L 
viel 
SI€.T 


91Е.1 
81€.I 
[3121 
licet 
£ce.T 


Sce.l 
8ze.I 
бєє.1 
c.l 
КЕЛ 


9Е0.1 
190.1 
940.1 
0S0.1 


SS0.T 
$$0.1 
950.1 
150.1 
850.1 


850.1 
650.1 
090.1 
190.1 
90.1 


+90.1 
990.1 
190.1 
690.1 
140.1 


crs. 
Sy8. 
878. 
158. 


58. 
258. 
558: 
Seg. 
958. 


958. 
158. 
858. 
858. 
658. 


098. 
198. 
298. 
£98. 
98. 


VTS, 
925. 
125. 
65. 


0©$. 
ots. 
ots. 
IES. 
1Е5. 


IES. 
IES. 
TES. 
TES. 
TES. 


5. 

EES. 

pes. 

vts. 

ses. 
LJ 


58Е. 
98Е. 
LBE. 
88E. 


68E. 
68E. 
68t. 
68E. 
06€. 


06t. 
06. 
06. 
06E. 
16Е. 


16Е. 
16Е. 
T6E. 
cet. 
TGE- 


87. 
vSc. 
vsc. 
SST. 


957. 
92. 
982. 
982. 
97. 


952. 
952. 
982. 
982. 
LST. 


т. 
152. 
15. 
15% 
85% 


*s1ousiqnd 


«рт pho: IANO Af syqnd ,,yorwasay JVPW pue елпу 
оон oada st 9[qe1 әлоде IYL 


344 TABLES 


Table IV. F-Distribution : 5 Percent Points 


T 1 2 3 4 5 6 8 12 24 at 
1 |1614 1995 2157 2246 2302 2380 2389 2439 2490 2543 
2 |1851 1900 1916 1925 1930 1933 1937 1941 1945 19°50 
3 |1013 955 928 912 90у 894 884 874 864 853 
4 Тї 694 659 639 625 616 604 591 577 56 
5 661 549 511 519 505 495 482 468 453 436 
6 599 514 476 453 439 428 415 400 384 367 
7 359 474 435 412 39] 387 373 357 34 32 
8. | 532 446 407 284 369 358 344 328 312 2:93 
9 $42 426 386 3:63 348 337 322 307 290 27! 
10 496 410 371 348 333 322 307 291 274 254 
\ 
п 484 398, 359 336 320 309 295 279 261 240 
12 | 475 388 349 326 311 300 285 269 250 230 
13 461 380 341 318 302 292 277 260 242 22 
14 | 460 374 334 311 296 285 270 253 235 20 
15 454 368 329 306 290 279 264 248 229 207 
16 449 363 324 301 285 274 259 242 224 201 
17 | 44S 359 320 296 2:81 270 255 238 219 196 
18 | 441 355 316 293 277 266 251 234 215 192 
19 438 352 313 290 274 26 248 231 211 188 
20 | 435 349 310 287 271 260 245 228 208 184 
2 432 зат зот 284 268 2:57 242 225 205 181 
22 | 430 344 305 282 266 255 240 223 203 1°78 
23 | 428 342 303 2:80 264 253 238 220 200 1°76 
24 | 426 340 301 278 262 751 236 218 198 1°73 
25 | 424 338 299 276 260 249 234 216 196 171 
(26 | * 337 298 274 259 247 232 215 1:95 1°69 
127 | 421 335 296 273 257 246 230 213 193 167 
28 | 420 334 295 271 256 244 229 212 191 1°65 
29 | 418 333 293 270 254 243 228 210 190 164 
30 | 417 332 292 269 253 242 227 209 189 162 
до | 408 323 284 261 245 234 218 200 179 15] 
60 | 400 315 276 252 237 225 210 192 170 139 
i20. | 392 307 268 245 229 217 202 183 1:61 125 
55 384 299 260 2:37 221 210 194 175 152 100 


The above table is reproduced from Table V of Fisher & Yates : 
*Statistical Tables for Biological, Agricultural and Medical Research" 


) published by Oliver & Boyd Ltd., Edinburgh issi 
EU and publishers. у! x inburgh, and by permission of the 


TABLES 345 


Table V. F-Distribution : 1 Percent Points 


3 4 5 6 8 12 24 d 


4052 4999 5403 ne 5764 585 5982 6106 6234 6366 


1 
2 98°50 9900 9917 99°25 99°30 99:33 9937 9942 99°46 99°50 
3 3419 3082 2946 2871 2824 2791 2749 2705 2660 2612 
4 2120 1800 16:69 15:98 1552 1521 1480 1437 1393 1346 
5 1626 1327 1206 1139 1097 1067 1029 989 947 902 
6 1374 1092 978 9°15 8:75 847 810 772 T3) 688 
7 1225 955 845 785 7 46 719 684 647 607  5'65 
8 1126 865 759 701 6 63 637 603 567 528 486 
9 1056 802 699 642 6 06 580 547 51 473 43t 
10 1004 756 655 599 5 64 539 50 471 433 391 
11 9:65 720 622 567 532 507 474 440 402 3'60 
12 933 693 595 541 506 482 450 416 378 336 
13 907 670 574 520 486 462 430 396 350 316 
14 $86 651 556 503 469 446 414 380 343 300 
15 $68 636 542 489 456 432 400 367 329 287 
16 8:53 623 529 477 444 420 389 355 318 275 
17 840 611 5'18 467 434 410 3°79 345 3:08 2°65 
18 828 601 5'09 458 425 401 371 337 300 2'57 
19 818 5:93 5'01 450 417 3:94 3:63 330 292 249 
20 810 585 49 44. 410 387 356 323 286 242 
21 5:02 578 7 437 404 381 s1 317 280 236 
22 T9& 572 4°82 43 399 376 3 4s 312 275 231 
23 788 566 476 426 394 371 3 4] 307 270 226 
24 T82 56 472 422 390 367 3 36 303 266 221 
25 777 $51 468 418 386 363 332 299 262 2 7 
26 772 553 464 414 382 359 329 296 258 2713 
27 768 549 460 411 378 356 326 2:93 2:55 210 
28 764 545 4:57 407 375 353 323 2:90 2:52 2:06 
29 760 542 4:54 404 373 350 320 287 249 2:03 
30 756 539 451 402 370 347 317 284 24! 2:01 
40 731 518 431 383 351 329 99 2:66 229 1°80 
60 708 498 413 3:65 334 312 282 250 712 160 
120 685 479 395 348- 317 296 266 234 195 138 
© 664 460 378 332 302 280 251 218 179 100 

r & Yates: 


is reproduced from Table V of Fishe E 
Statistical Tables for Biological, Agricultural and Medical Research 
published by Oliver ,& Boyd Lid., Edinburgh, and by permission of the 
authors and publishers. 


ANSWERS AND HINTS 


3.4 


1. P(A+B)=P(4p)=1-P(AB); P(A Dj) - P(4X B)<1-P(A+ B) 
=1-P(A)-P(B)+P(AB). Now AB AB=O and AB + ДВ =B so that 
P(B) = P(AB) + P(4B); P(4 + B)-P(4) + P(B) - P(AB) =1- P(4) + 
P(AB). 

2. Required event - (А - AB) + (B- AB). 


8. P(A, + A) -P(4,) + Р(А,) - P(A, Az) < Р(А,) + P(A). Use 
induction. 


« (ll 
5. 1/2 


6. If A—at least one spade, then Z—none spade. P(A)= 
39) [52 | 
(2) (2) = 192 ; P(A) = 15/34. 


.*. А. B, C—both balls white, red, black respectively ; required 
event is A+B+C where A, B, C are pairwise exclusive. п = 625, 
m(A) = 30, m(B) = 42, m(C) =135 ; P(A + B + С) = 207/625. 

8. 2/9 

9. The probabilities in question are respectively 1 — (5/6)* = .52 
and 1- (35/36)?* = .49. 

10. (5,6) < 1/2 or k>4. Hence the answer is 4. 

11. 2/n 


13. Let the m+n tosses be represented by m+n rooms. Total 


number of event роіпіѕ= 2"+", Since т > п, only one run of m 


heads is possible which begins at the Ist or 2nd-- or (n+1)th room. 
Except for the first and last cases, the rooms just before and after 
the run of m heads must be sealed off with tails in order that the 
Tun of heads may not get longer. In the first case the room just 
after and in the last case the room just before the run of heads must 


ANSWERS AND HINTS 347 
be filled with a tail. Hence the event ‘run of m heads’ contains 
(n—1)2"-242,.2% 7 = (п + 3)2^-? points, and the first result follows. 


Let p, denote the probability of exactly 7 consecutive leads. Then 
Pm=(n + 3)'2"+, Pm} = (п + 2)/2™*8, ...рт+һ-2 7 5/2"+" by the first 
= gjen m Пратт and Pnan= 1/2999 3 рь +рт+л + 
24 5,254 4.25 4 3.2? + 2.2+ 
кх" = 


part, but Pm+n-1 
E Dinan Or + 3)2"+° + (n + 20". 
1j-1]/27*^**, Differentiating the relation 1+х+х° + 
(x9** — 1) (x - 1) and putting x= 2, the second result follows. 


ta = (e eren CIT GT - 


+ (26)* 52! _1 so that pare (белу: -1J/e* - ». 


26) =0261)* 
48) 1152 
16. 1- 18, | (13) = 0-99 


4 \ (48) |(52\ _ 46 
v. (4 |(25)/ Ge! = 553 
18. Ifthe person has to buy k tickets, then 


10000 — К\ //10000 { 
+<1-( ioo )/ 0009) st 2495» 


This gives k z- 70 so that the required answer is 70. 


or (.99)* < .5. 
isting of 4 cards, By 


19. There are 13 face values each cons 


(3.1.14) the result - 4** | (i5 ~ 0.000106. 


20. By (3.1.14) the result- (3 )(2)(2)/(4)=27- 


21. Noting that the 4 suits can be arranged among themselves 


in 41 ways, the result, by (3.1.14), is 4! (3 / (73)=0.129. 


22. Let the r tickets drawn be placed in r rooms. The event 
х= means that the ith room is occupied by ticket no. 5, 50 that 
rooms no. 1, 2, i - 1 are open to be filled by tickets. no. 1, 2, +s- 1 
and rooms no. i +1, ет by tickets no. $ +1, = п. Hence etc. 

23. If р; denotes the probability given by (3.1.15) (i > 1) and 
Po=N,/N, the result =Po +ра+'" 

24. 3036/54145 


348 ANSWERS AND HINTS 


25. Random experiment consists in drawing k+1 balls. Total 
number of event points = (№, + N)(N, + №. – 1) (У, +. К). The 
required event means that the (k+1)th drawing yields a white ball 
and hence contains (№, + N,—1)(N, +, 2)... (№, + N5- К) М, 
event points so that the answer is ЛҮ, /(Ї/, + Na). 

26. Continue to draw the remaining balls (which are of the same 
colour) and arrange all the balls drawn in N different rooms. The 
required event is equivalent to the event that the ball in the last room 
is white. 

27. By (3.1.16) either probability = 5/72. 

28. Let the balls be numbered 1, 2, ...r. Total number of event 
points is n”, since ball no. 1 may be placed in any one of the n cells 
and the same is true for balls no. 2, 3, ...r. If now i balls are placed 
in a given cell, we are left with r-i balls to be distributed in п-1 


cells, and since i balls can be chosen from r balls in ( 1 ways, the 


number of event points contained in the required event = ( А Jon = 1)". 


Hence the first result follows. Now p;’Pi+ı —1=n{i+1-(r+1)/nt/(r- i) 
which leads to the second conclusion. 


29. As іп Ex. 28, the total number of event points= (а + b)". 
The number of ways in ways in which n – i objects can be distributed 
to a men and i objects to b women = т D bí so that the number of 
event points contained in the required event = ( 1 ) q"71p 4. ( 5 | a"-? b? 
4G b) - (a- by]. 


30. If a given cell contains і particles, w w i 

) > е аге left wi тї 

particles to be placed in n-1 cells, Hints, by Ex. 13 Pd 
number of event points contained i ех 

in the requi = ere de 
mu. quired event = ( Ped 
91. See Ex. 14. P. 31. Т = 

. 14. P. 31. 5 
he total number of event points ( 2) 


4 we have л— 1 cells into which r particles ar 
Placed so that the required event contains (" E ') points 


Tf a given cell is empty, 


ANSWERS AND HINTS 349 


32. 1/3 
38. See Sec, 3.2 Ex. 5. Answers: 1/3, 3/8 


10 
34. x (-D'*/k! 
к=1 


35. A,—ball transferred is white, A,—ball transferred is black, 
X—ball drawn is white. Use (3.2.7). Answer: 5/21 

36. 61/140, 40/61 

87. 1/3 

38. Consider the first two urns only. Using (3.2.7) the probability 
of drawing a white ball from the second urn is 7/12 and hence that 
of drawing a black ball is 5/12. Similarly, considering the second and 
third urns, by (3.2.7) the required probability is 31/60. 


39. Let p,—probability of drawing a white ball from the ith urn, 
so that the probability of drawing a black ball from the ith urn 
=1-p,;. Considering the ith and (i + 1)th urns and using (3.2.7) 

Poi pet (1-р) Ga 20-1) 
But p, =N,/N which gives p, = N,/N and so on. 

40. If A, B are independent, P(AB) = P(A) P(B), and A- AB + AB 
where ABAB=O so that P(A)=P(AB) + Р(АВ) = Р(А) F(B) + P(AB), 
or P(AB) = P(A){1 – P(B)}= P(A) P(B). This shows that A and F are 
independent. 

41. P[A(B+C)] = Р(АВ+ АС) = P(AB) + P(AC) – P(ABC) = 
P(A)P(B) + P(A)P(C) – P(A)P(BC) = P(A)[P(B) + P(C) – Р(ВС)] 
= P(A)P(B + С) which proves that A and B+C are independent. By 
Ex. 40 4 and в+С= В С are independent etc. 

42. By (3.1.9) the required probability = Ур, — Xpipa- Xpipsp, 
- (ра) ра) = ра). 

48. Not mutually independent. 


4.9 


1. A-spade, B-heart or diamond, C- queen. First Answer 
= P{(A,B,C)} = P(A)P(B)P(C) 21.3.35 21/104. Second answer = 31/104 
= 3/52. 


350 ANSWERS AND HINTS 


2. Answer=1-%.3= 
3. (а) 5/16 (b) 1/2 (с) 13/16 
4. 80/243 
5. 1/64 
6. 1-(.8)t°—2(.8)°= .624, 32 
7. (D* 
8. 1568/6561 

Pe P^ 1/5, а= 5,6. Answer- lode + [+ n 

1 m m 
"T 3 {a+ P) +(@-р) ) 

10. (a) 16 or 17 (b) 33 

12. Use (4.4.3). 

13. See Ex. 3, Sec. 4.4. 


1 El i- n-i 
iu. A zio а-л 


s Pg 


16. Let E,, Ez, Ea denote two trials of the experiment of obser- 
ving if A, B, C FESON attends the class, i.e. two Bernoulli trials 
with probability of success 4, $, t respectively. We are concerned 
with. the joint performance of E,, Es, Е, which are independent. 
Required probability 


ERG TGO (02-89 )- 
СЕО ET ШЧ (3) 
ERGEG- 


Ble 


17. See Ex. 6 Sec. 4.4. (a) [j^ — G - 1)"1/6" (b) [07 - iy - (6-16 
ы 18. Let Y—required event. Then X- at least one of the n tickets 
oes not appear in k drawings, and hence if A; —ticket no. і does not 
appear іп К drawings (i=1, 2, n), Х=хА,. By (3.1.9) and using 


ANSWERS AND HINTS 351 


synimetry P(X) =nP(A,)- ( Е Jea 145) +++. Now the probability that 


ticket no. 1 does not appear in a single drawing = (фә o 2) armi, 
n 


and since the drawings are independent, Р(4,) = (n - m)*/n*. Similarly, 
А, Аз tickets no. 1 and 2 both do not appear іп k drawings so that 


PAA) =("5, | d k " (my nmt" dis 

19. 14500, p=1/365, np 500/365 =u. Ву (4.4.8) answer ~eu 
—.348. 

20. 5*e' 5/4! (approx.) 

21. l/e (approx.) 

22. Required probability = Xp, Pr. Prs dr dis whene k,,k-, 
--k, are all different, each having one of the values 1, 2,.-n. 

23. Ву (4.5.1) answer =}. 8.3 1..7 

24. Ву (4.6.3) answer =8 !/21 3 ! 3! 6% 

25. 9/50 

26. When a pair of dice are thrown the probability of six, p, 
=5/36 and that of seven, p= 1/6. Then g,=1-p,=31/36, qa = 
1-p,=5/6. Let А„— А wins in 2л +1 throws, i.e. six appears at the 
(2n + 1)th throw but the Ist, 3rd,...(2n – 1)th throws result in ‘not six’ 
and the 2nd, 4th,,..2nth throws result in ‘not seven’ (п= 0, 1, 2,- ). 


со 

Then Y= x A, is the required event, where Ao, A,..are pairwise 
n=0 

exclusive events connected with an infinite sequence of independent 


throws of a pair of dice. Hence Р(Х) = х PCAs) =, Qi" ds" рі = 
n= n= 


D.(1-4142) ^ = 30/61. 
27. А, = ће first person wins in 3n+1 throws, ie. the first head 


appears at the (3n+1)th throw (п = 0, 1, 2,---). AaS are pairwise 

exclusive events connected with an infinite sequence of Bernoulli trials 

with probability of success p=}. Probability of the first person’s 

winning = P(S AQ = х P(A,) = x (1—p)?"p =4/7. Similarly, the pro- 
п=0 n=0 ъ=0 

babilities of win by the second and third persons are 2/7 and 1/7 res- 

pectively. 


352 ANSWERS AND HINTS 


28. Let X—event of scoring п points. Then Х= А+ B where A,B 
are mutually exclusive events defined by: A- head in the last throw 
scoring 1 point and scoring 1 — 1 points in all throws except the last 
and B tail in the last throw scoring 2 points and п- 2 points in all 
throws except the last. P(A4)- Pn-1-4, F(B)-p,...3. Hence р, = 
P(X) 21; (рь T Ps.) Or 2p,— pa, Da-s. We have (- 2)" (Pn = Pa- 1) 
=(- 2)": (Pu-i = Pa-2)= ES -4(p, —pi) But рі=%, ра=+%+}. = $, 
so that p,-p, ,-(—1)^/2^. Replacing n by п-1, n—2,--2 and 
adding p, =} {2+(- 1)^/2^122/3 as ne. 

29. Let 4,, А, — (n - 1)їһ day is dry, wet respectively. X- nth day 
is dry. Given P(X|A,)=p, P(X|A.)=p', P(A,)) ou, ., Р(Х) = un. 
Then P(A.) =1- P(A,)= l-u&-,. By (3.2.7) us us p(1- uy-1)p! 

-Or u,-(p-p)u,-,—p'-0. For p=3/4, p'=1/4, u,- 4u,_,-1=0 
or "(Zun — 1) =2"-> (2u,_, — 1) =--- -2Qu, – 1) =2, since u,=1. Hence 
и,=%+ 1/2”. 
30. a-(1,0,0,). (i) By (4.8.6) answer = TiPiiPisDostmaDisDaopos 
+л1 різ Рза Раз = 0 (ii) answer = p, ,?— 1/6 (ii) answer = д.09) = 0, 
31. First prove by induction that 
Pray —(i4 re tg- E a 
2-р-Ф1-9 1-р!" 2-p-q \q-11-q 
Hence probability distribution at the nth trial = P^-2 
1 +4-1)"-® 
“zope 01-0 0255177 i p aug 1) 
zi(p —- 1) +24(1- a)l 


55: = (1-4, 1- p) as n e, since lIp+q-1| <1. 


5.10 
1 h 
?oG-l | F(x- tdt. If x < x', F(-2) < F(x 4f) so that, on 
-h 
integration from—h to h, G(x) < Gx). For x-hetax+h, 
F(x —h) < F(t) < F(x+h) which gives F(x- h) < G(x) < F(x + h). 


Making) tre or —o05. we get G(=)= 1, G(— «)=0. G(x) is conti- 
nuous everywhere. x 


- - 


ANSWERS AND HINTS 353 


8. Use (5.5.3). Answers: F(x)=i@ + 1)/n(i+1) (i x«i) 
(i —1, 2,5) ; (n – 3)/4п, (n — 5)/6n | 

4, AeF(-«)-0, D- F(«)-1, 1/6= P(X=0)= F(0)- F(-0)- 
C-B, 2/3=P(X > 1) -1- F(1)21- C. Hence B=1/6, C=1/3. 

5. Spectrum 0,1,2,---6; corresponding probablities : 1/16, 1/8, 
3/16, 1/4, 3/16, 1/8, 1/16 

6. (a) Binomial (5, .4) (b) Spectrum: 0,1,2,3,4 ; corresponding 
probabilities : 1/42, 5/21, 10/21, 5/21, 1/42 

i n it 

zo slm 2n Sy [1 
8. x:=i(i=0, bliid 
9. See Ex. 12 Sec. 4.9. 


oc 
RED LAN. et 
n! 


ta- n l fe 
д 
11. See Poisson process, Sec. 5.6. The number of wars іп £ years 
is Poisson distributed with parameter и= 47 where 2 = 1/15, 1=25 so 
that ш = 5/3, and the required probability is e~5%, 
12. The number of misprints in ¢ pages is a Poisson variate with 
parameter «= At where 4=500/500=1, t=1 so that »=1 and so the 


answer = e7* (++ ++ +з) =8/3e. 


10. 1 ferras E 


13. The number of bacteria in t c.c. of water is Poisson distributed 
with parameter w~=at where 4—10?/10^ —10, t=1 so that д=10. 
Hence the required probability =e-*°. ‘ 

oo 1 

14. /(x)> 0 everywhere and f f(x)dx x |x|dx=1. Hence f(x) is 
possible density function. F(x)=0 (x < - 1), (l-x*)/2(-l<x< 0), 
(1 +x°)/2 0<x<1),1(@>1). 

Mf 15. 2, 3/4 
№ 16. 1/n, 2tan-te/n, 2—4 tan-te/x 
ы 17. Probability masses 1/3, 1/6, 1/6, 1/3 at 0,1,2,3 respectively 
18. 1/2 
20. 1/ J2 
23 


354 ANSWERS AND HINTS 


21. КІ(1+ 0) 
23. Uniform in (0, 1). Calculate F(x) for | Х|. 
24. X-cos Y, Y uniformly distributed over (0, x). Use (5.9.3). 
25. Y=+Acot X, X uniformly distributed over (0, л). Use (5.9.3). 
26. е”(0<у<х) 
27. f(y)ee-üos |y Jon (0 <y « e) 
98. 2e? y?ici[T(]) O< y — 9) 
30. x;—i* (0, 1, 2,--) ; f;i- e"u'[i |. Use (5.9.5). 
6.8 


1. Let X—no. of the ball, Y—colour no. ; х;= i (i=0, 1,8), y; =J 
(j=0, 1, 2. Spectrum of (X, Y) is given by (x; у;) =(i, j) (= 
8; j=0,1, 2); f4-1/9 for (1-0, 1,2,3; j-0), (j-4, 5, 6; ja 1), 
(i47, 8; j 22) and 0 for the rest of the values of i,j. fei — fi. = 1/9 for 

alli; fy; —f.;—7 4/9, 1/3, 2/9 for j = 0, 1, 2 respectively. The required 
conditional distribution: firo =ftolfyo = 1/4 for all i. 

2. Let X—ball no, Y—colourno. x;=i (i=0,1, 2, 3), у=] 
(j=0, 1, 2). Spectrum of (X, Y): (xo yi)=(i, j) (i=0, 1, 2, 3; 
j=0,1,2). /;=1/12 for all i, j ; fı 1/4 for all i, /;= 1/3 for all j 
so that f;;=/,.f.; for all i, j. Hence etc. 

3. (xi yj 7G, J) (i= 1, 2,13 ; j= 1, 2, 3, 4). Ла= 1/52 for all 
ЖО 


4. (х7)=(1=0,1,2,--;]=0,1,2,--п) ; е „№ (7 je'a -pr3 
5. f;(X)-sin x (0 < x < 2/2), (у) = sin y (0 < у < 2/2) 


1 x 
в. By(6.3.6),1=K | 
х=0 y=0 
4x* (0 <х < 1), ДЩу)=4у(1- у) (0 < y <1). 
7. 38, 5/24, 5/8, 3/5. Draw diagrams. 
8. (3х° – 8xy+ 6y*)/(6y*-4y+1)O<x<1, 0c y « 1), 
(3x? – 8ху + бу*)/(3х# -4x + 2) (0 < x < 1,0 < y <1) 


9. 2х(0<х<1), 2W-yO<y<1); 1/(1-у)(у<х< 1} 
1/x (0 <у < x); 1/2 


xydxdy=K/8 so that K=8. f,(x)= 


H 


ANSWERS AND HINTS 355 
10. fly) = yle] Ja, fy) = е2 Ја. By (6.5.8) f(x, у) 


=| y |en so that fa(x) =at f [ylen dy Zant x 


EE 


y e v0 d y = 3s (x? +a’). 


ceg 


11. Let X, Y denote the random points. (X, Y) is uniformly 
distributed over the unit square R: 0c x«l, 0<у<1. The 
required event is |X- Y|< k or Y > X-kif X7 Y and Y< X+k 
if Y > X, i.e. (X, Y) liesin the part R’ of R lying between у=х+К. 
Hence R’=1-(1-k)*=k(2-k) so that by (6.4.2) the required 
probability = R'/R = k(2 – k). 

12. (X, Y) is uniformly distributed over the unit square А: 
0<х< 1,0 < y < Land the event in questionis XY < k, i.e. (X, Y) 

1 


lies in R’ : xy < k. Hence R’ -Е+Е| 4х[х = К(1 — log К) etc. 
k 
13. (p,g) is uniformly distributed over R: -1<р<1, 
-1<q<1, and the required event is р? > g, i.e. (p, q) lies in A' : 
4 


р? >а. Hence R=4, R'=2+2 | p*dp =8/3, and the answer = 2/3. 
0 
l-z 

14. For0<x< 1, f(x)- + ду=1-х, and for -1 < x < 0, 


-(1-х) 
1+ 


fit) = | idy-l4x. Similarly for 0< y <1, /(у)=1—у and 
-l-g 
for -1 < y < 0, hO)=1+y. 

15. 2,/а#—х%/ла* (-а<х<а), 2,/a?—y2/na* (-а<у <a), 
1/2 Ja? —x* (— Ja$ -x* < y € „Ја? – х?) 

16. Choosing A as the origin, let X, Y be the co-ordinates of P, Q 
respectively. X, Y are independent, (X, Y) uniformly distributed over 
the rectangle R: 0<x<a,a<y<a+b, Required event: X — 
(a+6)/2, Y > (a+b)/2 and Y< X+(a+b)/2, i.e. (X, Y) lies in the 


354 ANSWERS AND HINTS 


21. К+) 

23. Uniform in (0, 1). Calculate F(x) for |X|. 

24. X-cos Y, Y uniformly distributed over (0, л). Use (5.9.3). 

25. Y=u+1cot X, X uniformly distributed over (0, л). Use (5.9.3)- 
26. е”(0<у<«<) 

21. fly)=e "oe 0" у Jon (0 <y <=) 

28. 2e? y*i-1[T(]) (0 — p<) 

30. х,=ї# (0, 1, 2,--:) sree enti! Use (5.9.5). 


6.8 

1. Let X—no. of the ball, y—colour no. ; x, - i (= 0, 1,--:8), yj =/ 
(j =0, 1, 2). Spectrum of (X, Y) is given by (xa yj) ^ (i, j) (i—0, 1,--- 
8; j=0,1, 2); f -1/9 for (i40, 1,2,3; j=0), (1—4, 5, 6; f=1), 
(i=7, 8; j 22) and 0 for the rest of the values of i,j. fzi=fi = 1/9 for 
alli; fyj=f.;=4/9, 1/3, 2/9 for j = 0, 1, 2 respectively. The required 
conditional distribution: f;,-/f;o/fyo = 1/4 for all i. 

2. Let X—ball no, Y—colourno. x;=i (i=0, 1, 2, 3), у=] 
(j=0, 1, 2). Spectrum of (X, Y): (х, y)=(i, j) (20,1, 2, 3; 
J=0, 1, 2). fij-1/12 for all i, j ; fj. 1/4 for alli, f.j-1/3 for all j 
so that fi; =/;./.; for all i, j. Hence etc. 

8. (52-7, j) (1=1, 2,13; j=1, 2, 3, 4). f; - 1/52 for all 
i,j. 

А А pt n n-j 

4. (х, 73) =(i=0,1,2,--- ; j =0,1,2,-+-n) ducent ("ра =p)’ 

5. fa(x)=sin x (0 < x < 4/2), (у) -sin у (0 < y < a/2) 

1 = 
6. Ву (6.3.6),1=K | | xydxdy- K|8 so that K=8. fal) 
«=0 y=0 
4x° (0< x< 1), /Ду)=4у(1- у) (0 < y <1). 
7. 38, 5/24, 5/8, 3/5. Draw diagrams. 
a t = паре бу*)/(бу* -4у+1)(0<х<1,0<у<1), 
~ Bxy + 6у*)/(3х* —4х+2)(0<х<1,0<у <1) 


9. 2х(00<х < 1), 2(1- 0 : = , 
Nos (1-0) (0<у<1); 101-у)(у<х< 1) 


ANSWERS AND HINTS 355 
10. Лу) = yer], Ja, fy) = ле Ja By (6.5.8) f(x, у) 


= Al yea so that /„(х) = Ал | Lylen dy = 24171 x 


Toe 


| y e ve 99 Ду = а(х + 42). 
0 


11. Let X, Y denote the random points. (X, Y) is uniformly 
distributed over the unit square R: 0c x«l, 0< y «1. The 
required event is |X- Y|< k or Y > X-kif X Y and Y< X4k 
if Y > X, i.e. (X, Y) liesin the part R’ of R lying between y=x+k. 
Hence R’=1-(1-k)*=k(2-k) so that by (6.4.2) the required 
probability = A'/R = k(2 - k). 

12. (X, Y) is uniformly distributed over the unit square А: 
0<х< 1,0 <y < Land the event in question is XY < k, i.e. (X, Y) 

1 


lies in R’ : ху < k. Hence R’ -к+®| dx/x = k(1 — log k) etc. 
k 
13. (p,g) is uniformly distributed over К: -1<р<1, 
-1<q<1, and the required event is р? > g, i.e. (p, q) lies in R’: 
1 


р° > 4. Hence R=4, R’=2+2 | p*dp = 8/3, and the answer = 2/3. 
0 
1-2 

14 For0<x<1, б) = | à dy=1-x, and for -1 <x < 0, 


-(1-х) 
1+2 


F(x) = | łdy=1+x. Similarly for0<y <1, f,(y)=1-y and 
-Y-z 
for -1<у<0,/,(у)=1+у. 

15. 24a*-x*[aa* (Ca « x < a) 2 Ja y*[na? (-a<y <a), 
1/2 Ja? -xè (— Ja? х? < y € Jat- x3) 

16. Choosing A as the origin, let X, Y be the co-ordinates of P, Q 
respectively. X, Y are independent, (X, Y) uniformly distributed over 
the rectangle R: 0<x<a,a< y<a+b, Required event: x< 
(a+ b)/2, Y > (a+b)/2 and Y < X (a B)J2, i.e. (X, Y) lies in the 


356 ANSWERS AND HINTS 


triangular region R’: x <(a+b)/2, y>(a+b)/2, y< x+ (a+ b)/2. 
R-ab, R'=}b*. Required probability = b[2a. 

17. Let ABCD be a tile which is a parallelogram and diagonal 
АС =1. Take A’ on AC such that AA’ =c, and so A/C-1-c. Draw 
A'B'||AB, A'D ||AD. The end of the stick towards C is uniformly 
distributed over R : ABCD, and required event means that the same 
end lies in R’: A'B'CD'. Since R|R'-(I-c)*[I? the first result 
follows. 

Draw parallelogram A,B,C,D, inside the parallelogrom ABCD 
whose sides are parallel to the sides of ABCD and such that the 
annular region has width 2/2. The distribution of the centre of the 
circle is uniform over R: ABCD and the required event means that 
it lies in R’ : A,B,C,D, so that the answer = R'/R =(1 — 4[а)(1 —d/b). 

18. Let X, Y—angles made by OP, OQ respectively with OA, О 
being the centre of the circle. Then X, Y are independent, and (X, Y) 

has uniform distribution over the square: 0 < x —2z, 0 «Xy < 2л. 
Consider the case X < У. The required event means either Y < 4 or 
Y-X >m, ie. (X, Y)lies in the triangular region: т> 0, y< a, 
x< y or the triangular region: x > 0, y < 2л, y > x * a, each having 
area a?/2. Answer-1/2. 


19. Probability mass between the ellipses л and 4+ dA is maximum 
(for fixed d;) when 2e-^21-»:) is maximum, which is so when 4° = 
1-р?. By Sec. 6.4(b) answer - 1/ Je. 


20. Set u = (x—m;)cosa + (у-т,) sina, v= - (х- т,) sina 
e(u, у e 
(y- my) cos а; ae о =1. Define ou, oy by сц? = о? cos?a + oy? sin?a 
А 
+ 2рсдсу Sina COS аз 0,2 =0,° Sin*a + су? cos?a — 2pozoy Sin а COS a. 
Assume tan 2a =2pozoy/(o22 — ey t). Then 


2 2 2 2 
Ou toy “ода +oy > 
оц 


= ov? = (02 — ay?) sec2a so that 40,°0,? = (0u? +0,2)* — (5,2 — o,2)* 
= 40,80, — (oz? — oy?)*tan?2a = 4(1 — p°)oz?oy? giving ccv 
басу Jl — p*, 


Now 
(x - m)? (x-mz)(y — my) — m)? и y? 
Boo E ge - nier 
By (6.6.1) fu, o (и, Y) = 1 -uM2o,2, 1 -v3/20,2 
„v (и, У) Teak db 7а ‘Ae, etc, 


ANSWERS AND HINTS 357 


21. U=X+Y, V=X; и=х+у, vex; хер у=и-у; Аѕх,у 
range from 0 to 1, u ranges from 0 to 2 and v from 0 to 1. By (6.6.2) 


fu) T Дх, dvo] flv, u-v) dv. Now f(y, u-») =u for 0 «v «1, 
0 < u-v <1 and 0 otherwise. Hence if 0 < u < леи феи, 
£ 
and if 1 < u < 2, fa(u)=u f dv-u(2-w). 
u-1 


22. и(б<и<1), 2-и(1<и<2); |1-и| (-1<и<1); 
-logu (0 < u< 1) 


23. X=Ucos V, Y=U sin V ; x=ucosy, у=и sin v; Moy, 


Since Y, Y are independent standard normal variates, by (6.6.1) 
Ju, v (и, Y) =u ect 4092/25 =u еч |2. Hence filu) =н ec"? 
(0 <и < =), fi(v)=1/2n (0 < v < 2л). 

24. Set Xı =U cos V, Xo =U sin V ; xy-ucosv, xg-usinv. As 
x,, x, range from 0 to œ, u ranges from 0 to œ and v from 0 to n/2. 
Answer : 2u?e7w (0 < u < œ). 

25. Set U=X3/X., V=Xı; и=ху[хг, V=X,; и, v both range 
from 0 to œ. fu) =(L +u)? (0 < u < ә). Z=X:|(Xı + Xa) 


=1/( + U) has spectrum (0, 1), and f.(2) = flu) Fi =10<2<1). 


26. Set U=XY, V=X; и=ху, у=х; Хеу, у= u/y. и ranges from 
— e to e and у from -1tol. Letu>0. Thenv=x>0 so that 


хау 2 


y ranges from 0 to 1 only and =-—y<0 everywhere. Hence 


= 


1 
2u a 
Sifu) = u {l POTEST q dv- e| Ju 
For u < 0, /,,(u) has the same expression. 
27, U=X+Y where X, Y are independent Poisson variates having 


parameters i, us respectively. Spectrum of Uis given by u,=k 
(Kk — 0, 1, 2+), and 


" ш a" e EL 
fus Уе пано Er а наб wal 


рая 
k 
-utaa ts + us} 


=e 


358 ANSWERS AND HINTS 


28. U=X+Y. Then by (6.6.3) 
sod- | 


dv ЖЕ 
Р [@= 41)? ta] u-u) xig] OS 9 < ) 
Ai f dt VM 
“2 | econ ami in is 


=% 


Set 
1 At+B _ A(t-a)+C 
(0° +447) [r7 а) +207] 42,2 Ua) 4, 
Then a A-B+C=0, A(a? — 3,? 4 4,?) - 24B - 0, ah’ A^ 
B(a*2,7)-2,?C-1 giving Ala? +232322) +2аС=0, А lat + 
* 2a* (4? + 20%)  (4,? —2,2)*] 224 or A [a? + (21  2,)?] [a? + (21 — 2,)?] 
—2a. Hence 
. 2 2 со 
м-н og A] + las-o 
First term vanishes, and so Su) = (As + Aa)/al(u— u, — uz) + (% +4s)?]. 
XX, t X^ X, isa Cauchy (n2, пи) variate by the first part, 
Set = Хуп; x — x[n. Hence 
ж ах п? A 
f 09-0) E > rg cd “aE E] 
29. Set U=X+Y+Z, V- x4 Y, W-X; иех+у+2, vexty, 
Wcx;Xewyery-wz-u-y; RR -]1. 


Аи) = | | fiw, v-w,u- v) dv dw 


-oo 


Hence 


u 


6 1 3u? 
“Waa [Ф| dni (0 < u <œ) 


30. беу, =x, Үз=Х,Х»,...Ү„= X, X. 
++ Vn=X1Xq...%,_ 50 that x, =}, Xo = у„/у,, 
function of У, is 


“Ха; yim Xi, Y2 =X 1X gy 
-Xn = Ya[yn-i. Density 


oc oo ос 
н dyidy,...dys.., 
Í J Го, Val Ja; Pal Yuna) Уйун 
= ||... | dy dy... ys, 
JiJa...Jn-4 


the range of integration being Уз<у1<1, у»<у„<1, e <1. 


ANSWERS AND HINTS 359 


31. Fora< x< b, (X >x)=(Xı > x, Xa > X,...Xn > x) so that 
P(X > Ж) PX... P(Xn > х) =(Ь-х)"|(Ь-а)® or F(x)= 
1-(b-x)"/(b—a)". Hence f(x) = F'(x) -n(b - x)"7*|(b — a)". 


7.14 
1. X—number of white balls. (a) X is binomial (n, N, /(Nı + N ;)) 
so:that E(x) enN М, + М), 0) РОХ = ("2 3) oe a 


Then Pg ed *) / m +.) 1, the summation being over all 


1 


possible values of i. E(X)= b ( yj el eA NR *) = N x 
Diet lect EM Jr 

ә. 2а*[3,а 

3. 21а; 21а JI-y? (0 <у<1) 


4. See Ех, 7 Sec. эло. Dy dpt UE M Xe 8. | 


wt "( 1 DE en (n+ Dans jJ Jara Replacing n by n+ 1 


n 


in the identity Hes Nay =1 we get i Spies 
t=0 i=0 
-i n wet ; - 
G7 7 "UIT eG) 2G) 


n-1l , ке " 
-( к З)" Ааны m (5) (3) 
-]. 
5. See Ex. 8 See 5.10 and Ex. 8 below. 
e I(-3/rQ) 
7. (a+b)/2, b- a) A2 


8. Differentiate the identity zt x'-(1- x)-* once and twice and 
i=0 


360 ANSWERS AND HINTS 


put х= д/(1 + и). Using these results 


$-1i 
"qi mu na 
AX Die gts, i (pa) = aut 
So that 02 «2,2 = (и 1) = u(u + 1). 
9. е-е 
10. 3, 15, – 86 
11. 1,1/5,0 
16. Il 1).- (L-k- 1) Н Іт ; 
(rm -e-mc1e--xmik- 1) (m) --mx1) 
17. Je, Me, Je(e- 1), (1—e:2/2)/ Je-1 
ASS filfi-1- (u*-1-i)|(--1). Hence argue. 
19. 1/(1-:Ja), k lat 
20. иар; =0, Hak =a®*/(2k +1) 


21. 017, ag=ut2u’, 


a5 — и+ 6u? + бц®, а= u+ 144° + 3649 
+240“; ку 


7, ка= + u’, Kg 7 ut 342 + 213, K4 7 (t Tu? + 1242 
би; у, = Qu 1] Jauti), уз = (623 +би+ D/u(u-* 1) 

22. x(t)- e'ut/(1 + 422%) . Kid, 
o= 4/24, у 70, y, 53 

28. 1,1,1 

24. 2 

25. 4,10,3 

26. 4, dlog,2 


ка= 24°, Ks = 0, «x, = 124; m= u, 


8.16 
1. 2a"/(n + 1)(n + 2) 
2. E[max(X, Y)}= | | max(x, yyf(x, у) dx dy 


œ о 
1 _(®*-?рлу+у2) 
"gc? | 40 ае Pts 
= m 


ANSWERS AND HINTS н 36r 


8. E{min (| Xil. | Xs1)t 


SEDET: 


ос 


=n | | wes | Ixile) eln) ахах [ ф/(х) = ф(х) 


—00/%,) iz, 
=n 2" | | E f х\ф(х,)--4(х„) dx dx, 
02, 2, 


ос 


=n 2" [ааба 
0 


-» [n-ecorax, 
0 


4, Spectrum: (x, 3) = (7) (5-70, 1,2) ; fu 1/6 for alli, ў 
except that fie =f21 =f2a=0. т==т„=2[3, ass = ауз = 1, a1; =1/6.. 
Hence oz = oy = ,/5/3, p= — 1/2. Regression lines: x +2y=2, 2x * y -2. 

X 17 153 7 4^ 

5. уо 606-4, x-4- hy- Z); p= 17 430/100; Je] is. 

a measure of goodness of fit. 


6. X—number of heads, Y—longest run of heads. Write down 
all the 16 event points and the corresponding values of Xen Ye 
Spectrum : (xo, y) =(i, J) (i, Ј=0, 1, 2, 3, 4) 3 Joo = 1/16, Jii =4/16, 
far = 3/16, fos = 3/16, fog = 2/16, fos = 2/16, fy, = 1/16, fi; =0 for other- 
values of i, j. m,=2, my 727/16, озї = 1, оу? = 247/256, p = 14] J247.. 


% 27/73 
8. m,= 3/2, my=1, a3, = 3/2, ш1=0 
9. Y-X-1 
10. 7/12, 7/12, „/11/12, /11/12,- 1/11. Regression lines : 
7 Life a gii efie 27 
z-p= -i-p Pa 1 (6-15) 
Regression curves : 


3x42 „_ Зу+2_ 
Y= 300x 41) *~ 3(2у+1) 


362 ANSWERS AND HINTS 


11. Regression curves : 
= 3х*-16х+9  .  36y*-32y49 
У" 6Gx¥— 4x42) = 12(бу#-4у+1ў 
Regression lines : 
-3=- 205-2 57-3 (7-3) 
ЪЗ ЕЕ p)*-3 3207 3 

18. The point of intersection is (3, 3) which gives m,=3, my, - 3, 
Раџа = — 1/6, poz/ay= — 2/3 so that p? = 1/9 or p= —1/3 as р<0. 

14. n-13. X;-point in the ith drawing (i=1,2,--n). From 
Symmetry X; takes values 1, 2, 3, 4 each with probability 1/4 so that 
E(X)-5|2. S,— X, +--+ X, is the total number of points ; E(Sn) 
=E(X,) +- E(X,) =n. $-65[2. 

15. a*[8 

16. 0 < E((X-kY)*] - E(X*) -2kE(XY) -k*E(Y?). Putting k= 
E(XY)| (Y?) the inequality follows, provided E(y*)#0. If E(Y?)-0, 
then Y has a one-point distribution at y=0 so that E(XY) -0 and the 
equality holds, 

The inequality for X*, Y* gives the second conclusion. 

17. U-aX-«bY, V-cY ; m, =ат, + bmy, My=CMy 3 сц - ato, 
+b cy? +2ab czoyp, oy=coy since c» 0. (U-m,) (V – т) =асх 
(X-m,)(¥ - my) + be(Y — my)? so that cov (U, V) =ас azayp + bco,? etc. 

18. 410505" +b. bsc? 

(a; 20g? + 0,%,*)(а, 2с. + ba ay?) 


21. = О cosa-V sin a, Y — U sin atV cosa; Since U, V are 
uncorrelated 5,?—5,? cos a+ cy? Sin*o, 


Oy? = дц? sin?a+ oy? cos?a 
and tan 2a= 2Pozcy[ (os? 


= oy"). Hence ont + су? = 012 + с“, dit ay? 
= (оь* — a?) cos 2а. Then (s,,* — 0,2)? = (02% — e,?)*sec?2a — (4? — o,?)? 
+ 4p?5, 35,2 Or lou? 1 o°)? = 46,204? = (0? Ф oy?)? -4(1 ==, poe ay? or 
010) = осу /1 — p*. 


22. p= +1. oy? = sin®%ag,? + COS*ac,* — 2p sin а COS acxoy = 
(sinaos — р CoSacy)* =0 if tan a= роу[ад. 

23, Show that the covariance vanishes, 

24. Define random variables Xi ХХ 


trials by: X;=0 or 1 corresponding to 
trial (i=1, 2,---n), 


on the event space of т 
‘failure or success in the ith 
X; is binomial (1, pj) so that т;= py o: = de 


ANSWERS AND HINTS 363 


Sn=Xı + + Xn denotes the number of successes in n trials, and the 
trials being independent, X,,---X,, are mutually independent. By (8.5.2) 
and (8.5.3) the results follow. 

25. There are №; =16 aces, kings, queens and jacks and N,- 36 
other cards in the pack from which n —13 are drawn without replace- 
ment. Hence by the example of Sec. 8.5, M, —4, x,? = 36/17. 

26. Define random variables X,, X2,---X, on the event space ofn 
Kr on by: X;— number on the ith ticket (i=1, 2,---n) so that Sn 
= X. + Xn is the sum of numbers on the tickets drawn. From 
келш AX,XQall have the same distribution. Now X, takes 
values 1, 2,-N each with probability 1/N so that m,=E(X,)= 
(N + 1)/2, с, var (X) - (VY? - 1/12. Hence m;=(N +1)/2, o; = 
(N*-1)12 ( —1, 2,--n). By (8.5.2 M,=n(N+1)/2. Consider now 
the distribution of (Xi, Xe). Spectrum: (i,j) (i, j=1, 2,...N) and 
P(X, =i, Xs-J)-lN(N -1) ifizj and O if i=j. Hence 


N N 
EXX) = gc —i) 2, 2 - > -( +1) (3N+2)/12 


Then cov (Xi, Хз) = - (N + 1)/12. From symmetry cov (X;, Xj) 
= —(N+1)/12 (i=j). Then by (8.5.3) 
п(№ -1) n-i 
ИЙЫШ zi] 
21. X- number of matches. By (3.2.6) we have the identity 


> Se 


1=0 k=0 
Ex)- -> Si nar 2 L SDa 
арена -> Oa EE D 2 


so that var(X) = 1. 

Another method. On the event space of п drawings we define 
random variables X;, Xas Х by: X;=1 or 0 corresponding to 
match or no match in the ith drawing, so that 5,= X, +++ X, is 


364 ANSWERS AND HINTS 


the number of matches in п drawings. 
identically distributed. Р(Х, = 1) = 1/п, and hence m, = E(X;) = 1/n. 
сі? - Var(X,) - (n- 1)/п°. Hence m;=1/n, в;® = (n— 1)/п? (i9 1,2, n). 
Ву (8.5 2) M,—1. The two-dimensional variate (X, , X) takes values 
(0, 0), (1, 0), (0, D, (1, 1) and P(X,-1, Х,=1)= 1/п(п- 1), so 

. that E(X,X;)-l/n(n-1) giving cov(X,,X.) - 1/n?(n— 1). From 
symmetry cov (X;,Xj)=1/n?(n- 1) (ij). By (8.5.3) х, =1. 

28. Let x(t) be the characteristic function of each of the random 
variables X,,...X, and K(t) that of their sum. By (8.8.3) {x(d)}" = K(i} 
= gimi-70313/2. ор x(7) = emt" 0217/2" which corresponds to a normal 
(т/п, o| Jn) distribution. 

30. See random walk problem, Sec. 89. Here r=5,n=9, p=q=}, 
X =2i-4=0 if i=2 so that by (8.9.1) the required probability = 9/128. 

91. x(t,u)=Poot Piot KPF pet, If PooP11=PoiPio 
x(t, u) = Pii [розро + PasPioe + рузроле'* + Pi 2290] 

“Раа (Por + рае) (ро + Pie) i Xa) = (0, 0) = P17? (Pio + Pas) x 

(роз +Рзе"), хии) =x(0, и) ^ p.17 (Pos + Prs)(Pro + руле"). X(t, и) 
=х.(0) х1) if (р, о +PisMPortPis)=Pir OF PooP11* +р11(р:о t+ Poa) 
+ Pix? =р,; OF if Poo pio + Por + P11 =1 which is true, 


32. By (8.13.17), (8.13.18), (8.13.19) c*(V,) =E(Vy?) = EKY - co* 
-6*XWy = E(YV;) = E(Y(Y - co* - с,* X)) = aoa — €o*a01 — 6.21: 


33. See Ex.4 above. a4, —5/3, az4=3, аз, = 1/6. Ву (8.14.4) the 
normal equations are c,*+%c,*+c,* —8, $co*  c,* + $e, * =4, 
Co” + §C,* + 3c5* =% which give со* = 1, c,*=—-4, c,*-0. Hence the 
required parabola reduces to the straight line у=1- 4х. Since this is- 


also the regression line of Y on X, a measure goodness of fit is. 
lel =1/2. 


34. X is normal (0, oz). 


NC 
аха = 02, agg = 0, ах, = Зо“, a11 = POxcys- 
со со 


1 8 
tu уга | | xv етос ду 


-00 = 


2 2 
eger a muy 
Oz бабу oy? 


so that 


ос 


со 
2 
Oz OC. 
imi md | | хау е-(<*-®рху+из)[1-ә0 1х4 у 
A 


From symmetry X,,:X, are 


ANSWERS AND HINTS 365 


oc 
2 


mil ж | x? e-a dx 
/2л 


-96 


иер] redit 


" E 
_ Рох су f x3 e- dx=0 
„2л 
Jæ 

Normal equations: с*+с„*с„®%=0, Cy*on?=poxcy, Co*0s" + 
3ca*os*=0 which give co*=0, c,*-poy/oz, Са*=0. Hence the 
regression parabola reduces to the straight line y=poyX|cz, Неге 
the regressions for the means are linear. 


35. f.G)-1ix?e*, fy) = 3/(у + 1)*, oy? = 3/4, mx) = 1х. сш? 


E | | (ro) х*е-*°®®4хау-ї. By (8.15.6) nye=1/ 3. 
о 0 " 
36. Write E(g(X))-a. By (8.15.7) EiY-my, Q0) = 0. Then 
cov {g(X), Y - m,(X)] 


- |. [i097 ar ту) Дх, ») dxdy 


со 


= | E J P-e) dy=0 


-oo 00 


since the latter integral vanishes. Y—c)*—c,*X={Y-m,(X)}+ 

{m,(X)- co* — C1 *X}. Ву (8.13.15) and (8.15.7), the means of Y - m,(X) 
and m,(X)- co*-c,* X are both zero so that E((Y - c9* — c,* X)?] 
= EY - m, Q7] + Elim (X) - c9* - c.* X}] + 2 cov iY-^m,(X), 
my(X) - Co* – ci * Xl. By the first part the last term is zero, and the 
second result follows from (8.13.9) and (8.15.6). When „= |p|, 
ту(х) e co* + су*х, i.e. the regression for the mean of Y is linear. 


366 ANSWERS AND HINTS 


9.4 


1 fx) - ‘enzo geek 0 
: mg 0<х< =) 


2. f(u)— J2a mM?! u*!* e-38«Im (Q <и — c) 

8. See Ex. 1 Sec. 6.6. 

4. y= /8/n, ya=12/n 

5. Z=X+Y. Since X, Y are independent. x,(1) = x,(t)xy(4)- 
Given x,()=(1-2it)-™/* x.(r) = (1  2if)-**/* it follows that x,(t) 
=(1-2it)-"/? etc. 

6. Боги > 1 the mean is zero, and hence 

2 1 па 

0 Jn BG, ан) ) Cee mye 


Ай xt dx 
Bs, in) (1 у 
0 


This integral converges only for п > 2, and in that case 
c? - nB($, àn - 1)/В(%, їп) =n/(n - 2) 
7. Forn 1 the mean is zero, and hence 


1 g t* dt 
Ha = Jn BG, їп) f a + f$ пу"! 
lio qe fi xt dx 
=BG, in) | (1+2д#@ = юа 
=n°B(§, ап - 2)/B(, Зп) = 3n*/(n-2(n-4) [ifno4 
By Ex.4 Ya7 palot —3 = 6[(n — 4). 
10. f.(im, in) 


10.3 


3. Number of heads is binomial (2000, 3) so that т = 1000, 
c*-500. Таке e=100 in (10.1.1). 


4. 17/625, 1/4 


ANSWERS AND HINTS 367 
5. Put ¿=n in (10.1.1). 
6. Sn*=o1* +++: + on" < nA? so that x, = o(n). 


7. Take X,, Х,,...Х to be mutually independent, each binomial 
© з 
8. р: < (25%) =i so that Z,? < п/4 and x, = o(n). 


11.3 


1. Number of heads is binomial (п = 2000, p= 1/2) ; np=1000, 
Anpg-10,/5. Taking а=—2,/5, b=2 45 in (11.1.5), the answer 
=24(2 5) - 1 = 0.999992 (See Fisher and Yates : Statistical Tables). 


2. If the die is thrown n times, X,, the frequency ratio of 
sixes is binomial (л, 3). Given P (|Xs[n—-3| < .01)=.99 or 


»( Х»-тб\ < O12455)-.99, by Table 1, .012./5542.58 or 
п= 9245. 


„5п[6 | 


3. Let us consider the continuous case only, the discrete case 
being similar, For any t >0 


(x m) fi(x)dx < gl | |x- ml” fy(x)dx 


— TEn 

[e-m |> 73, |z7m,|z 73, 
e. J Ix ^ mi? f(x)dx 
XIX. kb Jk 


1 
“is, E(| Xy — m| 2j 
Hence if Liapounoff's condition is satisfied, 


0 < R.H.S. of (11.2.1) 


«ilm ы D KD ml) <0 


Tn 
к=1 


so that R.H.S. of (11.2.1) is 0, i.e. Lindeberg's condition is fulfilled 


368 ANSWERS AND HINTS 


4. Sn? = 018 +5 са, En >0, |En} is monotonic increasing. 
Hence either x, — œ or {Sn} is bounded, In the latter case Sp<A, a 
constant for all n. Consider the continuous case. For any t > 0 


(x-m,)* fi@)dx > [ (x-m,)? fiG)dx = k (say) 
|z-m,|>73, |z-m,|> 74 
‘Choose т so small that k # 0 and so К > 0, which is always possible. 


Then for this т, R.H.S. of (11.2.1) > k/A* > 0 which contradicts 
Lindeberg’s condition. 


5. Х- (п) variate. Then Y=(X—n)/ n is the corresponding 
standardised variate which has spectrum (— Jn, œ). Set y =(x-n)/ J/n 
or x=yJntn. Л) = пее" jew (1+ y] Jn) [n Y (1+ yi JI). 
Now as n — œ, 1 + у/ /n— 1 and by (3.1.12) п" e^"[n ! — 1/ J/2n- 

Е у y? 3 
сы ы "Um 2n 6n®/? 
where ¿= £n, y) is such that 0 < & < A, constant, remembering that 
у>- n. 
feuin (1 + y] yn)" 
X al. а-а) -v2 = 
(1-5,5 6) а ene} ceno авт 
so that f(y) > ¢(y) as n — =. Then 
v у 
lim Fyy)=lim | f,0)dy =| tim 4,0) dy 


-0 


=Í 40) dy=40) 


-œ 


6. (а) Xn-x%n) variate. Then c.f. of (Х„-п)/ J/2n, x«*(t) = 
eN (1 — it J2]n) "l? = fetis (1— it JI)". Now 


- 49 з вуз 
etthm e Le it 2 - eet?) 


where 9 is a complex quantity such that |6|<|. Then 


*0- [1 «(1 „®) (2) ^ Шш" nn 
xe*()) {+ (5 +): (Е at е5 


oer" 


Б Jc» 


ANSWERS AND HINTS 369 


(b) If X, is Poisson-n variate, c.f. of (X, —7)/ Jn, х.) = 
git 4n ете! m3). омей An it | um) Now 


ае HE E us сй 
ETE E 2n + 9 Gnoe 


where 9 is a complex quantity such that |01< 1. Hence 
n(e't'sn — 1 — it] n) = — 122+ 01%]6 Jn- t? [2 and x,*(r) > et: 


12.6 
1. 4.86, 4, 4 
9. Classes : 15-20, 20-25,-..40-45 ; 25.90, 45.44 
3. 1.08, 0.92 
4. 8.20, 6.72, 0.12, — 0.58, 9, 8, 2, 11 


5. 0.79197, 0.01175, 0.26, —0.54 


13.5 
The following results will be useful in the sequel. X;, X;,--X, 
are mutually independent, each having the distribution of the 
population. 
E(X;—-m)-0 
Ejx(X; - M(X; - m) = ХЕХ: mx; – т) = no? 
Eix(X; - MX; - т) - n) = SEX: - M(X; – mn — т) = nig 
Eix(X; = mx; - m)(x, - m) - m) = x EX; - m)(X;- т) x 
Qn - m)(Xi т) = x EX; т) + ( 2 | > Е|(Х;- m)* Qc; -m)3| 
i-i 
=пи + 3n(n— 1) с? 
Ej s (X; - m) Qt; т) = SEX; - m) Qt - m) = nn; 
Ejx(X, - m)* (x; - mx; - m) = EjxQc; - m)* (X; т) 
=пи t n(n — 1) о“ 
Е\х(Х; - m) (X5; т) = nn, 


24 


370 ANSWERS AND HINTS 


1. E(X)=m, o(X)=o/ „п. Х-т =п-іУ(Х;- т) so that (x ^ m)? 
-nx(X- m) - m) - m)and (X- m)* - n7*x(X;- т); m) х 
(X; — m)(X,— m). Ву the above results, n. (X) = а/п, us X) = u/n? 
+ 3(n — 1)o*[n?. 

2 S'-m'x(X,-m)-(X-m)* so that $* -n*x(X;-m)?x 
(Xj m)? -2n-? s(x, – m) (X; — m) * (X - m)*. By the above relations 
and using the values of L.X) from the working of Ex. 1, E(S*) 
= (n-1)* n/n? 4 (n— D(* - 2n-3)s*[n*. By (13.3.2) 6°(5°%) = E(S*) 
– {Е(5 2) =(n- D*p,[n? - (n- 1) (п – 3)с* [n*. 

3. M,e-n-^'xi(X; т) — 3(x;- m) - m) + 3X; - m)(X¥ т)? – 
(X— m)*| -n"s(Y,—-m) – Зп x(X, - m)(x; - m) + 2(X-m)*. Using 
the above relations and the value of u,(X) from Ex. 1, we get E(M,). 

M,-n^' xi(X;- т)* – 4X; m) (X — т) + 6(X; ~. m)*(x – т) – 
4(X;- m)(X- m)? «(x— m) j-n-x(x,- m)* — 4n? У(Х; — m)? (X; 

= т) + 6n- У(Х; -– m)’ (х; – mY(Xy- m)-3(x – my 
get E(M,). 


Using the values of E(M,), E(M,),E(S*) (from Ex. 2), we see that 


the mean values of n? M;/(n - 1)(n- 2) and n*[(n * 1) M,- 3(n- 1)s*] 


[п — 1)(п — 2)(n - 3) are respectively кз and «4 which lead to the second 
conclusion, 


; from which we 


Now у, = es[o?, ys — x ,[o*. Using the above estimates of к, and 
x, and the unbiased estimate 5*7 nS?|(n – 1) of 0°, we get the last 
result, 


4. Е(Х®у=ак, а„(Х®) = E(X?^) = а„, so that в(Х*у= Jaan- ast. 
Xi", Xa™ Xn" are mutually independent having the same mean а and 


standard deviation Маар ауа if agp exists, whence the result follows 
from the central limit theorem for equal components, 


5. ols? = o(x3)e?|[(n —1)=в? М2(п- 1)/(п — 1). Note that ls = Зо“. 

6. (а) For a binomial (N, p) population пХ = XX; is binomial 
(nN,p), since Ху, Xn are mutually independent each binomial (N, p). 
Answer: x,—i[n 70,1, - N), f= (o) p! (1- р)»х-+, 


(b) For a Poisson-y population, n% — 


д XX; is Poisson distributed 
with parameter hu. Answer: x;-i[n (i= 


0, Ly), = e=" (пуу 1 


ANSWERS AND HINTS 371 


(c) Here nX is ;(ni) variate. Answer : f(x) = л" e ?* x"'^i[T(nl) 
(Ox ә), 

7. cov(X, 5:) = E(X - m)S?]. Now (X- m)S? = n? s(X;- m)? x 
(X,- m) - (X- т). Using the above relations and y. (X) from Ex. 1, 
cov( Y, S?) = (n - D)u,[n?. 


14.7 


1. -l-n[log (x,xa4:-x,) 


2. For a sample of unit size L=2(a—x)/a*. 010820 gives 


а= 2x. E(X)-«[3 so that E(2X) - 20/30. Hence the estimate is 
biased. 
8. 1/(1+%) 
F; mis the population mean, and hence the required conclusion. 
X(x;- т)*[п " 
* 
7/l, т = ol which gives the conclusion, 
9. For large n, X is approximately normal (и, /u/n) so that 


Jn[a(X 10 i is approximately normal. Then P(— uc 4 n[p(X — u) € 1): 
a йв 0r P(A Se ВУ ols where A, Bare the roots of the 


pmms 


equation in p : n(X - 1)? p = ue’. 

10. с=с Jl. For large m, Jm(X-al)a JI is approximately 
normal (0,1). P( — te < Jn (X - alVa JI <u) = 1-2 or P(A < a <B) 
~1-—,, where А, B are the roots of the quadratic equation in a: 
п( — ol)? = 1а?и‹% etc. 

11. By (14.5.1) : 14.6, 19.2 

19. By (14.5.3) : 51.2, 62.8 

By (14.5.3) m : 95%- (62.86, 65.99), length -3.13, 98%- 
“+ Я 66.41), length = 3.96. By (14.5.6) с : 9595 — (1.23, 3.83), length = 
2.60 ; 98% – (1.15, 4.44), length = 3.29 


14. Ву (14.6.3) : (51, .63) 
15. Ву (14.6.3) : .18, .25 
16. Ву Ех. 9 above: 4.59, 4.97 


372 ANSWERS AND HINTS 


17. (а) By Ex.9 : (7.6, 8.8) (b) By (14.6.4) : (7.7, 8.7) 
18. By (14.6.4) : 22. 42, 24.62 


19. Fi- Fatte yi, +n S, n;Ss*)vn,n, Where v=n, + 
"s —2 and teis given by (14.5.4). 


20. ViVa- observed values of X,, X, respectively. For large nna 
Ху, Xa are approximately nomal (2:21, Jn, p,q,) and (nspa, Jnspaqs) 
respectively where g, = 1- p,, qa=1 - Pz. Since p, =v,/n, Ре = у,[па, 
we can take X,, X, to be approximately normal (pis VV, —уу)/п,) 
and (nips, Jv (п, у.)/п) respectively, so that 

(Xi [ni — X«[ns) - (p, — p.) 


vn, — ving? +v (n, — v, )[n,* 
is approximately standard normal etc. 


15.5 
r=0.935 ; y - 8.65 =0.879 (x —9.44), x — 9.44 — 0.995 (» - 8.65) 


2. r=0.503 ; y -71.4- 0.496 (x 49,8), x – 49.8 - 0.510 (y -71.4) 
3. Y= 129.42 + 5.367x (Ir| = 0.950) 


у = 125.20 + 8.986x — 0.4524х° (Ry = 0.968) 
4. (а) у= – 1.2249 + 1.3396x (171 = 0.9988) 


(b) у= 0.18488 + 1.0762x + 0.0080502x2 (Ry = 0.9995) 
(с) у=6.4109 + 0.039812x*(R, = 0.9885) 
5. у=х? – 8.13х + 19,56 


1. 


16.9 


1. Statistic : х® = s(x;-m)*[ag? whose sampling distribution 
under Ho is x*-distributed with n D.F.A left-tailed test for H,: 
9 о and a right-tailed test for Ну 102709. 


2. Two-tailed x?-test where x? is the same as in Ex. 1. 


9. H,:m-15.5 against no alternative, By (16.6.1) и= 1.20. For 
2=.05, by (16.6.2) C.R. : |u| > 1.96. H, is accepted. 
4 H, m= 


:05 against по alternative. By (16.6.3) t= – 4.819. 
Two-tailed t-test. For v=11, P(I£| 2 4.819) < .001, 


Hence the value 
~of t is highly significant, and we reject Ho. 


ANSWERS AND HINTS 373 


5. Hə: т=0 against no alternative. By (16.6.3) f= 2. Two-tailed 
t-test. For г = .05, y=9, by ( 16.6.4) C.R. : |t|> 2.262. Accept Ho.» 

6. Ho: т= 65 = т, against H, : m < т,. By (16.6.3) t= — 0.870. 
Left-tailed t-test, For v2 7, P(t« —0.870) =.21. Accept Ho. 

7. Ho: o=5.2 against no alternative. By (16.6.5) X? — 24.882. 
Two-tailed x*-test. For ғ = .05, v= 19, by (16.4.9). C.R. : 0<x°< 8.825. 
and x? > 33.096. H, is accepted. 

8. Hy: c=0.1=co against Ну: o > со. By (16.4.5) x? = 21.560. 
Right-tailed x?-test For ғ = .05, v = 10, by (16.4.6) B.C.R. : х*:> 18.307. 
Accept H,. 

9. Ho: m,<ni,g against no alternative. By (16.7.3) и= 1.16. For 
e=.01, by (16.7.4) C.R. : |u| > 2.58. Accept Ho. 

10. Н: т, =m, against no alternative. Ву (16.7.5), 1= 1.50. For 

2=,05, v=18, by (16.7.6.) C.R. : |1|2 2.10. Ho is accepted. 

11. Ho: т, = т against H, : m, > т, assuming c, = сг. Right- 
tailed t-test ; 2—4.384 by (16.7.5). For v=18, P(£24.384) < .001 ; 
value of z is highly significant so that Ho is rejected. Since the second 
sample has a greater value of s”, by interchanging the two populations 
Hy: оу =з against no alternative. Two-tailed F-test whene F = 1.27 
by (16.7.7). For ¢=.1, v, = Va = 9, by (16.7.8) the right critical interval = 
F>3.19. Но is confirmed. 

12. Hy: m,-mg assuming c=0 against no alternative. By 
(16.7.5) 120.749. Two-tailed t-test. For v= 13, P(\t| > 0.749) 2.47. 
Accept Ho. Ho: с. =оз against no alternative. Two-tailed F-test 
where F —2.31 by (16.7.7). For v, =7, v5 = 6, ғ=.1, by (16.7.8) C.R. = 
Е>4.22. Н, is accepted. 

13. Ho: с=с. against H,: o, > ог. By (16.7.7) F—4.93. 
Right-tailed F-test. For ғ=.01, v, = 24, v,—14, by (16.7.8) C.R.: 
F > 3.43. Н, is rejected. 

14. Ho: р=0 aganist H, : p<0. Left-tailed t-test where t= – 2.688 
by (16.8.1). For у = 4998, P(t < — 2.688)=.004, Reject Ho. 

15. Hə: р= 0 against no alternative. Two-tailed t-test where 
122.178 by (16.8.1). For v=14, P(|£] > 2.178) -.048 ; value oft 
simply significant so that we reject Ho, but not very confidently. 


374 ANSWERS AND HINTS 


17.7 


i. Ho: p=1/6 against no alternative. Ву (17.1.1) и= 4.19. 
Two-tailed standard normal test. P(|U| > 4.19) < .001 ; value of 
u is highly significant. Reject R,. 


2. Ho:p-1/A; no alternative, Ву (17.1.1) u=2.17. For ¢=.05 
by (17.1.2), C.R. : |u| > 1.96, Reject H,. 


3. Ho: p-i-p, against H, DP pe By (17.1.1) w= 242. 
Right-tailed standard normal test. F(U > 2.42) =.008 ; value of u is 
highly significant and so H, is rejected. 


4. Ha: p=0.1; no alternative. By (17.1.1) w=2.17. For ¢=.01, 
by (17.1.2), C.R. : ju] > 2.58. Accept Hy. 


5. First coin: Ну: Pi=1/2; no alternative. By (17.1.1) u= 
— 2.91. Two-tailed standard normal test. PU U| > 2.91) =.004 ; value 
of u is highly significant. Reject Не. 


Second coin: Hy: p2=1/2; no alternative. By (17.1.1) u = — 3,47. 


Two-tailed standard normal test. P(|U|> 3.47) < .001 so that H, is 
rejected, 


Но: p, ps against no alternative. By (17.2.1) p= 0.3295, and 
by (17.2.2) u - 0.20. Two-tailed test. F(|U|> 0.20) -.84 ; value of 
и is not significant at all, Accept Н. 


6. H+: p.=ps against no alternative, By (17.2.1) p 20.7832, 
by (17.2.2) u = 4,36. Two-tailed test. P(| U| > 4.36) < .001. Reject Hy. 


7. Ho: џ=8; no alternative, By (17.3.1) u=0,66. Two-tailed 
test. P(| U| > 0.66) =.51 ; value of is v not significant at all. Accept H o- 


8. Ho: и= 1/5 ; no alternative, By (17.3.1) u = — 0.98. Two-tailed 
test. P(| U| > 0.98) =.33; value of u is not significant. Accept Ho. 


9. Ho: pe=1/6 (k=1, 2, ...6), Right-tailed x*-test where X^ 
=48.464 by (17.51). For v-5, P(x? > 48.464) < .001; value of 
x? is highly significant. Reject Ay, 


19; Hs 1р, =9/16, р, 
by (17.5.1) x? =3.70, 
Accept H,. 


73/16, ps=1/4. Right-tailed x?-test where 
For e=.05, у=2, by (17.5.2) C.R. : x? > 5.99. 


ANSWERS AND HINTS 375 


її. Ho:p,-.l pa=.5, p.74. By (17.5.1) x* -1.206.— Right- 
tailed x?-test. For у=2, P(x? > 1.296) =.53; value of x? is not 
significant at all. Accept Ho. 

12. (а) Combine the last 3 entries of the table. х? —20.09. For 
»-3, P(x? > 20.09) < .001 ; value of x? is highly significant. Popu- 
lation distribution is not binomial (5, 1/6). 

(b) Combine the last 3 entries of the table. p is replaced by 
p=0.216, x?-1.045. For у=2, Р(х? > 1.045) -.60 ; value of x* is 
not significant. Population distribution is binomial (5, p). 

18. The first 2 and the last 3 entries of the table are combined 
together. и is replaced by #¥=8.20. x*-2.924. For y-7, 
P(x? > 2.924) =.89. Population is Poissonian. 

14. The first 2 and the last 2 entries of the table are combined 
together. m and о are replaced by %=0.79197 and S=0.0117 
respectively. х° = 6.746. For v=6, P(x? > 6.746) =.35. Population 
is normal. 


18.7 


1. From symmetry about the origin, X, Y have the same distribu- 
tion, and let the density function of each be f(x). Since X, Y 


are independent, f(x, y) =f(x) f(y). Setx-r cos 0, yr sin б. The 
distribution of (X, Y) being symmetrical about the origin, E Лх, y)-0 
which gives f’(x)/x/(x) =f ?(y)] yf(y) = К, a constant, so that f(x) = Ae*?* /?, 
Now i (x) dx=1; for convergence of this integral k < 0, and 
m —1/o?, we get A—1/ 4/22 c. Hence etc. 


2. E(X)-a,m,----a,m,-m which shows that X can be re- 
garded as a measured value of a quantity whose true value is m. 
The second part follows from (8.5.8) and (18.3.1) and the third from 


(18.5.2). 

4. By (18.5.5) W-cxw,e-x-m, and by (18.5.20) w;-e;- 
W-ixw,e;--(w,W)es - + (1 -w,W)e,- -— (Wal/W)2n. Бог the 
corresponding random variables V; = — (wi /W)E3 — + +(1—w,/W)E,— 


376 ANSWERS AND HINTS 


+++ — (Wa/W)Ex. Since Ei, E.,°-E,, are mutually independent, E; being 


normal (0, c/ Jw) (1-1, 2, =m), (Уд) = D wo? [W (1 = wl)? 


kx¥i 
х о[и; = (0 – Wic*/Ww;. Hence etc. 
5. 439204, ;1 — 0.224, Ot 0.151, o (X) = 0.071, Q'(X) = 0.048. 
Confidence limits : 50% – 39.154, 39.254 ; 95% — 39.043, 39.365 
6. х=0.6793, О 


2 Qıt=0.0044, 0,.1=0,1=0,1-0.0036, Q,1- 
0.0031, 0;(¥) - 0.0018 


- Confidence limits : 0.6725, 0.6861 
7. (а) d,*- 1.59, q,* = 4.93, q,* — 1,47 

m,” = 3.59, m,* = 14.38, m,* — 26.36, m,* = 10.11 

(6) q,*—1.56, q,* = 491, q;* - 147 

m,*= 3,54, m,* = 14.31, m,* = 26.20, m,* = 10.03 


’ 


BIBLIOGRAPHY 


ARLEY, N. and BucH, К. R.: Introduction to the theory of probabi- 
lity and statistics, Wiley, New York, 1950 i 
CHUNG, K. L. : Elementary probability theory with stochastic 
processes, Narosa, New Delhi, 1978 
CnAMER, H.: Random variables and probability distributions, 
Cambridge University Press, Second edition, 1961 _ 
Mathematical methods of statistics, Princeton University 


Press, 1958 
The elements of probability theory and some of its appli 
cations, Wiley, New York, 1959 А 


FELLER, W.: An introduction to probability theory and its applica- 
tions, Vol. I, Wiley Eastern, Third edition, 1978 
An introduction to probability theory and its applica- 
tions, Vol. II, Wiley Eastern, 1977 
FisHER, R. A.: Statistical methods for research workers, Oliver and 
Boyd, Edinburg, Eleventh edition, 1950 
and Yates, F.: Statistical tables, Oliver and Boyd, Second 
edition, 1943 
GNEDENKO, В. У.: 
GOLDBERG, S.: Probability : 
Jersey, 1960 
Hort, P, G.: Introduction to mathematical statistics, Asia, Bombay, 
Second edition, 1961 
KENDALL, M. G.: The advanced theory of statistics, Vols. I-II, 
Griffin, London, 1945-48 
KENNEY, J. Е. and KEEPING, E. S.: Mathematics of statistics, Part I, 
Van Nostrand, Third edition, 1954 
Mathematics of statistics, Part II, Van Nostrand, Second 
edition, 1951 


The theory of probability, Mir, Moscow, 1969 
An introduction, Prentice-Hall, New 


378 BIBLIOGRAPHY 
KorwocoRov, A. N.: Foundations of the theory of probability, 
Chelsea, New York, 1950 


Levy, H, and ROTH, L. : Elements of probability, Oxford University 
Press, 1936 А 
LINDGREN, B. W. and McELRATH, О. W.: Introduction to proba- 
bility and statistics, Macmillan, New York, 1959 
М5, R. V.: Probability, statistics and truth, William Hodge, 
London, 1939 . 
Моор, A. M. and GRAYBILL, F. A. : Introduction to the theory of 
statistics, McGraw-Hill, New York, Second edition, 1963 
MUNROE, M. E.: Theory of probability, McGraw-Hill, New York, 
1951 
NEYMAN,J.: First course in probability and statistics, Holt, New 
York, 1953 
PARZEN, E.: Modern probability theo 
- New York, 1960 


Rozanoy, Y. A. : Introductory probability theory, Prentice-Hall, 
New Jersey, 1969 


ry and its applications, Wiley, 


Smart, У. M.: Combination of observations, Cambridge University 
Press, 1958 


USPENSKY, J. V. : Introduction to mathematical probability, Mc- 
Graw-Hill, New York, 1937 


WEATHERBURN, C.E.: A first course in mathematical statistics, 
Cambridge University Press, Second edition, 1949 

WENTZEL, E. S.: Probability theory (First steps), Mir, 1982 

WHITTAKER, E. and ROBINSON, G.: The calculus of observations, 
Blackie, London, Fourth edition, 1944 


WILKS, S. S.: Elementary statistical analysis, Princeton University 
Press, New Jersey, 1958 


Mathematical statistics, Princeton University Press, New 
Jersey, 1944 


Addition rule for mean values 153 


for probabilities 18, 23 
Additivity, complete 21 
Asymptotically normal 208 
Axioms 21 


Bayes’ theorem 35 
Bernoulli’s theorem 201 
Buffon’s needle problem 105 


Cartesian product 46 
Characteristics 127, 222, 258 
Class frequency 220 
interval 220 
limits 220 
mark or midpoint 220 
Classical definition 13 
Coefficient of excess 138, 224 
of kurtosis 138, 224 
of skewness 137, 224 
‘Contidence coefficient 245 
interval 246 
limits 246 
"Convergence ‘in probability’ 198 
Correlation coefficient 155, 258 
ratio 182 
‘Covariance 155, 258 
Critical region 274 
best 275 
Cumulants 143, 239 
Cumulative graph 218 
Curve, density 81 
distribution 71 
Curve fitting, parabolic 179, 264 


Decile 148 
Degrees of freedom 188, 192 
Deviation, standard 135 


INDEX 


Diagram, dot 258 
frequency 219 
probability 74 
scatter 258 

Dispersion, measure of 134 

Distribution, binomial 75 


beta, of the first kind (8,) 86 
» of the second kind (3,) 86 


Cauchy 85 

causal 74 

chi-square (x?) 188 

conditional 109 

continuous 80, 101 

discrete 72, 97 

F 191 

gamma (y) 86 А 

Laplace 151 

log-normal 151 

marginal 95 

multinomial 327 

normal 84, 107 

of ће sample 218 

Pascal 150 

Poisson 76 

rectangular 82, 104 

sampling 231 

Student's (г) 192 

uniform 82, 104 
Equiprobability ellipses 108 
Error 316 

accidental 316 

elementary 318 

experimental 1, 315 

mean square 321 

of observation 316 

probable 321 

random 316 

root mean square 321 


380 


Error, systematic 316 
two types of 274 _ 
Estimate 233 


consistent 233 
unbiased 233 

Estimation, interval 245 
point 245 

Event points 9 
space 9 

Events 1, 3, 9 
certain 4, 9 
complementary 5, 9 
compound 4 
impossible 4,9 
mutually exclusive 4, 9 
simple 4 

Expectation 127, 152 
conditional 168, 169 * 


Frequency, absolute 16 
conditional 18 
definition 18 
interpretation 21 
ratio 16 

relative 16 _ 

Function, characteristic 140, 164 
density 81, 101 
distribution 69, 94 
error 321 
likelihood 241, 267 
moment generating 139 
step 72 


Gaussian law 320 
Grouping of data 220 


Histogram 220 
Hypothesis, alternative 272 
composite 271 
null 272 
simple 271 
statistical 271 


Independent events 39 
random experiments 47 


INDEX 


Independent random variables 96 
trials 48 


Kurtosis 138 


Law, binomial 50 
multinomial 57 
normal 317 
of large numbers 202 
Least squares, principle of 173,. 322 
330, 332, 335 
Liapounoff's condition 212 
Likelihood equations 242, 267 
ratio 285 
» testing 283 
Limit theorem, central 209 
deMoivre-Laplace 204 
for characteristic functions 211 
Lindeberg’s condition 209 
Location, measure of 135 


Markov chain 59 
Maximum likelihood estimates 242, 
267 
method 241 
Mean, 130, 222, 258 
conditional 168, 169 
weighted 327 
Mean value 127, 152 
conditional 168, 169 
Median 145, 223 
Mode 147, 223 
Modulus of precision 321 
Moments 132, 155, 223, 258 
absolute 132 
central 132, 155, 223, 258 
Multiplication rule for mean values 
162 
for probabilities 19, 32 


Neyman-Pearson theorem 275 


Normal equations 174, 264, 322. 
330, 333, 335 


Parametric point 272 
space 272 

Peakedness 133 

Percentile 148 

Population 216 E 

Power of critical region 274 
of test 274 

Probability, conditional 18, 31 
differential 81, 102 
element 81 
mass 72 
of transition 61 

Process, Poisson 77 
stochastic 77 


Quantile 148 

Quartile deviation 148 
lower 148, 223 
upper 148, 223 


Random experiments 1, 2 
observations 1 
walk problem 167 
Random variable 66 
normalised 136 
standardised 136 
Range 224 
semi-interquartile 148, 224 
Regression coefficient 175, 263 
curves 169 
» least square 173 
function 169 
T" least square 174 
linear 170 
lines 175, 263 
parabola 180 
Reproductive property 117, 165 
Residual 174, 261 
Residuals of the sample 264, 323 


Sample 216, 257 
characteristics 222, 258 


INDEX | 381 


mean 222, 258 

point 230 

space 230 

values 217 

variance 222, 258 
Schwartz’s inequality 185 
Semi-invariants 143 
Set 5 

empty 6 

null 6 
Significance level 275 

tests of 273 
Skewness 137 
Statistic 231 
Statistical regularity 16 
Stirling's formula 27 
Student's ratio 239 
Stochastic variable 66 
Subset 5 
Sum polygon 218 


Tchebycheff's inequality 195 
theorem 201 
Test 273 
best 275 
of goodness of fit 310 
most powerful 275 
Trials, Bernoulli 49 
» infinite 58 
Poisson 55 
repeated 48 
Transformation of random variables 
87, 113 
Transition matrix 61 


Variance 134, 222, 258 
about regression function 171 
conditional 168, 169 
residual 178 

Variate 66 


Weight 326 


Published by B. K. Dhur of Academic Publishers, 5A, Bhabani Dutta Lane, 
Calcutta - 700073 and printed at K. P. Basu Printing Works, 
11, Mahendra Gossain Lane, Calcutta - 700006, 


"uo Жс 


